SAGE LLM Control Plane Enhancement Tasks¶
This document outlines the tasks for enhancing the sageLLM Control Plane to support dynamic engine lifecycle management and GPU resource scheduling (Issue #1284).
Task 1: GPU Resource Manager (Phase 1)¶
Assignee: Copilot A
Goal: Implement GPUResourceManager to monitor and manage GPU resources.
File: packages/sage-llm-core/src/sage/llm/control_plane/gpu_manager.py
Instructions:
- Create the file
gpu_manager.py. - Implement the
GPUResourceManagerclass. - Dependencies: Use
pynvmlfor GPU monitoring. - Check
packages/sage-common/pyproject.toml. Ifnvidia-ml-pyis not listed, add it to dependencies. - Handle
ImportErrorgracefully (mock if not available or if running on CPU). - Key Methods:
__init__(self): Initialize NVML.get_system_status(self) -> List[Dict]: Return status of all GPUs (index, name, memory_total, memory_used, memory_free, utilization).check_resource_availability(self, required_memory_gb: float, count: int = 1) -> List[int]: Return list of GPU indices that satisfy the requirement.allocate_resources(self, required_memory_gb: float, count: int = 1) -> List[int]: Reserve resources (internal accounting).release_resources(self, gpu_ids: List[int], memory_gb: float): Release resources.estimate_model_memory(self, model_name: str, tensor_parallel_size: int = 1) -> float: Implement a heuristic to estimate memory usage (e.g., 2GB per 1B params + overhead). You can use a simple lookup or formula for now.- Error Handling: Ensure robust error handling for NVML calls.
Task 2: Engine Lifecycle Manager (Phase 2)¶
Assignee: Copilot B
Goal: Implement EngineLifecycleManager to spawn and stop vLLM processes.
File: packages/sage-llm-core/src/sage/llm/control_plane/engine_lifecycle.py
Instructions:
- Create the file
engine_lifecycle.py. - Implement the
EngineLifecycleManagerclass. - Dependencies:
subprocess,psutil(for process management),sage.common.config.ports.SagePorts. - Key Methods:
spawn_engine(self, model_id: str, gpu_ids: List[int], port: int, extra_args: List[str] = None) -> str:- Launch
vllm.entrypoints.openai.api_serveras a subprocess. - Set
CUDA_VISIBLE_DEVICESenv var based ongpu_ids. - Return a unique
engine_id.
- Launch
stop_engine(self, engine_id: str) -> bool:- Send SIGTERM, wait, then SIGKILL if needed.
get_engine_status(self, engine_id: str) -> Dict: Return status (RUNNING, STOPPED, FAILED), PID, port, model.list_engines(self) -> List[Dict]: List all managed engines.- Port Management: You might need a simple helper to find available ports if
portis not provided, or rely on the caller. - Logging: Log all spawn/stop events.
Task 3: Control Plane Integration & API (Phase 2/3)¶
Assignee: Copilot C
Goal: Integrate managers into ControlPlaneManager and expose Management API.
Files:
packages/sage-llm-core/src/sage/llm/control_plane/manager.pypackages/sage-llm-core/src/sage/llm/unified_api_server.py
Instructions:
-
Update
ControlPlaneManager: -
Import
GPUResourceManagerandEngineLifecycleManager(assume they exist from Task 1 & 2). - Initialize them in
__init__. - Add method
request_engine_startup(self, model_id: str, ...):- Calculate required memory.
- Call
gpu_manager.allocate_resources. - If successful, call
lifecycle_manager.spawn_engine. - Register the new instance using
self.register_instance.
- Add method
request_engine_shutdown(self, engine_id: str):- Call
lifecycle_manager.stop_engine. - Call
gpu_manager.release_resources. - Unregister instance.
- Call
-
Add method
get_cluster_status(self): Return GPU status and Engine list. -
Update
UnifiedAPIServer: -
Add new FastAPI routes for management:
POST /v1/management/engines: Trigger startup.DELETE /v1/management/engines/{engine_id}: Trigger shutdown.GET /v1/management/status: Get cluster status.
- Connect these routes to the
ControlPlaneManagermethods.
Task 4: CLI Commands (Phase 3)¶
Assignee: Copilot D
Goal: Add CLI commands to manage engines.
File: packages/sage-cli/src/sage/cli/commands/apps/llm.py (or new file llm_engine.py
imported there)
Instructions:
- Extend the
sage llmcommand group. - New Subcommands:
sage llm engine list: CallGET /v1/management/statusand display table.sage llm engine start <model_id>: CallPOST /v1/management/engines.sage llm engine stop <engine_id>: CallDELETE /v1/management/engines/{engine_id}.sage llm gpu: CallGET /v1/management/statusand display GPU info.- Implementation Details:
- Use
httpxorrequeststo talk to the localUnifiedAPIServer(default port 8000 or from config). - Use
richfor pretty printing tables (Engine status, GPU usage). - Handle connection errors (e.g., if
sage llm serveis not running).