llama.cpp: Dynamic Model Management via Router Mode
AI Impact Summary
llama.cpp server now supports dynamic model management via router mode, enabling users to load, unload, and switch between models without restarting the server. This introduces a multi-process architecture for increased resilience and allows for efficient resource utilization, particularly beneficial for A/B testing or multi-tenant deployments. The feature leverages auto-discovery of GGUF files from a specified cache or directory, streamlining the model switching process.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info