LocalAI 3.12.0 Release: Multi-modal Realtime & Voxtral Backend
Action Required
Users can now leverage a more versatile and performant AI infrastructure stack with enhanced capabilities for real-time interactions and diverse model support.
AI Impact Summary
LocalAI released version 3.12.0, introducing significant new capabilities including multi-modal real-time conversations with text, images, and audio, a new high-quality text-to-speech backend (Voxtral), and improved GPU support for Diffusers. The release also includes optimizations for legacy CPUs and UI/UX enhancements. This update represents a substantial advancement in LocalAI's AI infrastructure stack, particularly expanding its support for diverse modalities and improving performance across different hardware configurations. Users should plan to migrate to this new version to take advantage of these improvements.
Affected Systems
- Date
- 20 Feb 2026
- Change type
- capability
- Severity
- high