Scope for MVP:¶

Features:¶

Local Model storage
- Unified location
- Downloading models (asynchronously)
- Updating models
- Deleting models (asynchronously)
- Storage available
Deploy vLLM docker
- Deploy instance (asynchronously)
- Stop instance (asynchronously)
- Status of instance (Docker health check)
- Attach GPUs to instance
vLLM container version control
- Dropdown for selecting vLLM container version
- Checkbox for most recent version
- Nightly job to pull available version to populate dropdown
Deploy OpenWebUI
- Initial config
- Status
- Delete
- Stopping (asynchronously)
- Status of instance (Docker health check)
GPU state
- Vram load
- Utilization
- Power consumption
Async operations for IO-bound tasks
- Container management
- Image operations
- Model downloads
- Graceful thread pool shutdown
Use health checks for containers
Unified local model location

Deploy InferAdmin in docker
InferAdmin interacts with host docker to deploy docker containers for inference/interface
FastApi for backend (async mode)
Frontend Vue JS + Shadcn
YML for config and data, pydantic for representation

Proxying inference requests in front of vLLM to route to correct model
- Placeholding /llms endpoint for this functionality
Engines other than vLLM
Have multiple storage locations
Add analytics for vLLM instances collected from vLLM's prometheus instance
Enhanced logging system to replace print statements