LocalAI

self_hosted

17 signals tracked

Website

LocalAI releases version 4.1.3 with bug fixes and dependency updates
What's Changed Bug fixes 🐛 fix(token): login via legacy api keys by @mudler in #9249 fix(anthropic): do not emit empty tokens and fix SSE tool calls by @mudler in #9258 fix(gpu): better detection for MacOS and Thor by @mudler in #9263 👒 Dependencies chore(deps): bump google.golang.org/grpc from 1.79.3 to 1.80.0 by @dependabot [bot] in #9253 chore(deps): bump github.com/jaypipes/ghw from 0.23.0 to 0.24.0 by @dependabot [bot] in #9250 chore(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.32.12 to 1.32.14 by @dependabot [bot] in #9256 chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus from 0.64.0 to 0.65.0 by @dependabot [bot] in #9254 Other Changes chore: ⬆️ Update ggml-org/llama.cpp to d0a6dfeb28a09831d904fc4d910ddb740da82834 by @localai-bot in #9259 docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #9260 chore: ⬆️ Update ace-step/acestep.cpp to e0c8d75a672fca5684c88c68dbf6d12f58754258 by @localai-bot in #9261 chore: ⬆️ Update leejet/stable-diffusion.cpp to 8afbeb6ba9702c15d41a38296f2ab1fe5c829fa0 by @localai-bot in #9262 Full Changelog : v4.1.2...v4.1.3
7 Apr 2026
HighCapability
LocalAI releases version 4.1.3 with bug fixes and dependency updates
What's Changed Bug fixes 🐛 fix(token): login via legacy api keys by @mudler in #9249 fix(anthropic): do not emit empty tokens and fix SSE tool calls by @mudler in #9258 fix(gpu): better detection for MacOS and Thor by @mudler in #9263 👒 Dependencies chore(deps): bump google.golang.org/grpc from 1.79.3 to 1.80.0 by @dependabot [bot] in #9253 chore(deps): bump github.com/jaypipes/ghw from 0.23.0 to 0.24.0 by @dependabot [bot] in #9250 chore(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.32.12 to 1.32.14 by @dependabot [bot] in #9256 chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus from 0.64.0 to 0.65.0 by @dependabot [bot] in #9254 Other Changes chore: ⬆️ Update ggml-org/llama.cpp to d0a6dfeb28a09831d904fc4d910ddb740da82834 by @localai-bot in #9259 docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #9260 chore: ⬆️ Update ace-step/acestep.cpp to e0c8d75a672fca5684c88c68dbf6d12f58754258 by @localai-bot in #9261 chore: ⬆️ Update leejet/stable-diffusion.cpp to 8afbeb6ba9702c15d41a38296f2ab1fe5c829fa0 by @localai-bot in #9262 Full Changelog : v4.1.2...v4.1.3
7 Apr 2026
MediumCapability
LocalAI releases version 4.1.2 — new Qwen3.5 model and llama.cpp updates
What's Changed Bug fixes 🐛 fix(autoparser): correctly pass by logprobs by @mudler in #9239 fix(chat): do not retry if we had chatdeltas or tooldeltas from backend by @mudler in #9244 Exciting New Features 🎉 feat(llama.cpp): wire speculative decoding settings by @mudler in #9238 Other Changes Update index.yaml and add Qwen3.5 model files by @ER-EPR in #9237 chore: ⬆️ Update ggml-org/llama.cpp to 761797ffdf2ce3f118e82c663b1ad7d935fbd656 by @localai-bot in #9243 chore: ⬆️ Update leejet/stable-diffusion.cpp to 7397ddaa86f4e8837d5261724678cde0f36d4d89 by @localai-bot in #9242 docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #9241 Full Changelog : v4.1.1...v4.1.2
6 Apr 2026
MediumCapability
LocalAI releases version 4.1.2 — new llama.cpp and Qwen3.5 model support
What's Changed Bug fixes 🐛 fix(autoparser): correctly pass by logprobs by @mudler in #9239 fix(chat): do not retry if we had chatdeltas or tooldeltas from backend by @mudler in #9244 Exciting New Features 🎉 feat(llama.cpp): wire speculative decoding settings by @mudler in #9238 Other Changes Update index.yaml and add Qwen3.5 model files by @ER-EPR in #9237 chore: ⬆️ Update ggml-org/llama.cpp to 761797ffdf2ce3f118e82c663b1ad7d935fbd656 by @localai-bot in #9243 chore: ⬆️ Update leejet/stable-diffusion.cpp to 7397ddaa86f4e8837d5261724678cde0f36d4d89 by @localai-bot in #9242 docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #9241 Full Changelog : v4.1.1...v4.1.2
6 Apr 2026
MediumCapability
LocalAI releases version 4.1.1: Gemma 4 tokenization and API improvements
This is a patch release to address few regressions from the last release and the upcoming Gemma4, most importantly: Fixes Gemma 4 tokenization with llama.cpp Show login in api key only mode Small fixes to improve Anthropic API compatibility What's Changed Other Changes docs: Update Home Assistant integrations list by @loryanstrant in #9206 chore: ⬆️ Update ggml-org/llama.cpp to a1cfb645307edc61a89e41557f290f441043d3c2 by @localai-bot in #9203 chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9210 chore: bump inference defaults from unsloth by @github-actions[bot] in #9219 docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #9214 chore: ⬆️ Update ggml-org/llama.cpp to d006858316d4650bb4da0c6923294ccd741caefd by @localai-bot in #9215 fix(ui): pass by staticApiKeyRequired to show login when only api key is configured by @mudler in #9220 feat(gemma4): add thinking support by @mudler in #9221 fix(nats): improve error handling by @mudler in #9222 feat(autoparser): prefer chat deltas from backends when emitted by @mudler in #9224 fix(anthropic): show null index when not present, default to 0 by @mudler in #9225 feat(api): Allow coding agents to interactively discover how to control and configure LocalAI by @richiejp in #9084 chore(refactor): use interface by @mudler in #9226 fix(reasoning): accumulate and strip reasoning tags from autoparser results by @mudler in #9227 chore(model-gallery): ⬆️ update checksum by @localai-bot in #9233 chore: ⬆️ Update ggml-org/llama.cpp to b8635075ffe27b135c49afb9a8b5c434bd42c502 by @localai-bot in #9231 New Contributors @github-actions[bot] made their first contribution in #9219 Full Changelog : v4.1.0...v4.1.1
5 Apr 2026
CriticalCapability
LocalAI releases v4.1.1: Gemma 4 tokenization and API improvements
This is a patch release to address few regressions from the last release and the upcoming Gemma4, most importantly: Fixes Gemma 4 tokenization with llama.cpp Show login in api key only mode Small fixes to improve Anthropic API compatibility What's Changed Other Changes docs: Update Home Assistant integrations list by @loryanstrant in #9206 chore: ⬆️ Update ggml-org/llama.cpp to a1cfb645307edc61a89e41557f290f441043d3c2 by @localai-bot in #9203 chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9210 chore: bump inference defaults from unsloth by @github-actions[bot] in #9219 docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #9214 chore: ⬆️ Update ggml-org/llama.cpp to d006858316d4650bb4da0c6923294ccd741caefd by @localai-bot in #9215 fix(ui): pass by staticApiKeyRequired to show login when only api key is configured by @mudler in #9220 feat(gemma4): add thinking support by @mudler in #9221 fix(nats): improve error handling by @mudler in #9222 feat(autoparser): prefer chat deltas from backends when emitted by @mudler in #9224 fix(anthropic): show null index when not present, default to 0 by @mudler in #9225 feat(api): Allow coding agents to interactively discover how to control and configure LocalAI by @richiejp in #9084 chore(refactor): use interface by @mudler in #9226 fix(reasoning): accumulate and strip reasoning tags from autoparser results by @mudler in #9227 chore(model-gallery): ⬆️ update checksum by @localai-bot in #9233 chore: ⬆️ Update ggml-org/llama.cpp to b8635075ffe27b135c49afb9a8b5c434bd42c502 by @localai-bot in #9231 New Contributors @github-actions[bot] made their first contribution in #9219 Full Changelog : v4.1.0...v4.1.1
5 Apr 2026
HighCapability
LocalAI 4.1.0 Release: Production-Grade AI Platform with Distributed Clusters
🎉 LocalAI 4.1.0 Release! 🚀 LocalAI 4.1.0 is out! 🔥 Just weeks after the landmark 4.0, we're back with another massive drop. This release turns LocalAI into a production-grade AI platform : spin up a distributed cluster with smart routing and autoscaling, lock it down with built-in auth and per-user quotas, fine-tune models without leaving the UI, and much more. If 4.0 was the foundation, 4.1 is the control tower . Feature Summary 🌐 Distributed Mode Run LocalAI as a cluster — smart routing, node groups, drain/resume, min/max autoscaling. 🔐 Users & Auth Built-in user management with OIDC, invite mode, API keys, and admin impersonation. 📊 Quota System Per-user usage quotas with predictive analytics and breakdown dashboards. 🧪 Fine-Tuning (experimental) Fine-tune models with TRL, auto-export to GGUF, and import back — all from the UI. ⚗️ Quantization (experimental) New backend for on-the-fly model quantization. 🔧 Pipeline Editor Visual model pipeline editor in the React UI. 🤖 Standalone Agents Run agents from the CLI with local-ai agent run . 🧠 Smart Inferencing Auto inference defaults from Unsloth, tool parsing fallback, and min_p support. 🎬 Media History Browse past generated images and media in Studio pages. New (long version) Full setup walktrough: https://www.youtube.com/watch?v=cMVNnlqwfw4 🚀 Key Features 🌐 Distributed Mode: scaling LocalAI horizontally Run LocalAI as a distributed cluster and let it figure out where to send your requests. No more single-node bottlenecks. Smart Routing: Requests are routed to nodes ordered by available VRAM — the beefiest, free GPU gets the job. Node Groups: Pin models to specific node groups for workload isolation (e.g., "gpu-heavy" vs "cpu-light"). Autoscaling: Built-in min/max autoscaler with a node reconciler that manages the lifecycle automatically. Drain & Resume: Gracefully drain nodes for maintenance and bring them back with a single API call. Cluster Dashboard: See your entire cluster status at a glance from the home page.
2 Apr 2026
HighCapability
LocalAI 4.1.0 Release: Production-Grade AI Platform
🎉 LocalAI 4.1.0 Release! 🚀 LocalAI 4.1.0 is out! 🔥 Just weeks after the landmark 4.0, we're back with another massive drop. This release turns LocalAI into a production-grade AI platform : spin up a distributed cluster with smart routing and autoscaling, lock it down with built-in auth and per-user quotas, fine-tune models without leaving the UI, and much more. If 4.0 was the foundation, 4.1 is the control tower . Feature Summary 🌐 Distributed Mode Run LocalAI as a cluster — smart routing, node groups, drain/resume, min/max autoscaling. 🔐 Users & Auth Built-in user management with OIDC, invite mode, API keys, and admin impersonation. 📊 Quota System Per-user usage quotas with predictive analytics and breakdown dashboards. 🧪 Fine-Tuning (experimental) Fine-tune models with TRL, auto-export to GGUF, and import back — all from the UI. ⚗️ Quantization (experimental) New backend for on-the-fly model quantization. 🔧 Pipeline Editor Visual model pipeline editor in the React UI. 🤖 Standalone Agents Run agents from the CLI with local-ai agent run . 🧠 Smart Inferencing Auto inference defaults from Unsloth, tool parsing fallback, and min_p support. 🎬 Media History Browse past generated images and media in Studio pages. New (long version) Full setup walktrough: https://www.youtube.com/watch?v=cMVNnlqwfw4 🚀 Key Features 🌐 Distributed Mode: scaling LocalAI horizontally Run LocalAI as a distributed cluster and let it figure out where to send your requests. No more single-node bottlenecks. Smart Routing: Requests are routed to nodes ordered by available VRAM — the beefiest, free GPU gets the job. Node Groups: Pin models to specific node groups for workload isolation (e.g., "gpu-heavy" vs "cpu-light"). Autoscaling: Built-in min/max autoscaler with a node reconciler that manages the lifecycle automatically. Drain & Resume: Gracefully drain nodes for maintenance and bring them back with a single API call. Cluster Dashboard: See your entire cluster status at a glance from the home page.
2 Apr 2026
HighCapability
LocalAI 4.0.0 Release: Agentic Orchestration & New UI
🎉 LocalAI 4.0.0 Release! 🚀 LocalAI 4.0.0 is out! This major release transforms LocalAI into a complete AI orchestration platform. We’ve embedded agentic and hybrid search capabilities directly into the core, completely overhauled the user interface with React for a modern experience, and are thrilled to introduce Agenthub ( link ) a brand new community hub to easily share and import agents. Alongside these massive updates, we've introduced powerful new features like Canvas mode for code artifacts, MCP apps and full MCP client-side support. Feature Summary Agentic Orchestration & Agenthub Native agent management with memory, skills, and the new Agenthub for community sharing. Revamped React UI Complete frontend rewrite for lightning-fast performance and modern UX. Canvas Mode Preview code blocks and artifacts side-by-side in the chat interface. MCP Client-Side Full Model Context Protocol support, MCP Apps, and tool streaming in chat. WebRTC Realtime WebRTC support for low-latency realtime audio conversations. New Backends Added experimental MLX Distributed , fish-speech, ace-step.cpp, and faster-qwen3-tts. Infrastructure Podman documentation, shell completion, and persistent data path separation. 🚀 Key Features 🤖 Native Agentic Orchestration & Agenthub LocalAI now includes agentic capabilities embedded directly in the core. You can manage, import, start, and stop agents via the new UI. 🌐 Agenthub: We are launching Agenthub ! This is a centralized community space to share common agents and import them effortlessly into your LocalAI instance. Agent Management: Full lifecycle management via the React UI. Create Agents, connect them to Slack, configure MCP servers and skills. Skills Management: Centralized skill database for AI agents. Memory: Agents can utilize memory with Hybrid search (PostgreSQL) or embedded in-memory storage (Chromem). Observability: New "Events" column in the Agents list to track observables and status. 📚 Documentation: Dive into the new capabiliti
14 Mar 2026
HighCapability
LocalAI 4.0.0 Release: Agentic Orchestration & New UI
🎉 LocalAI 4.0.0 Release! 🚀 LocalAI 4.0.0 is out! This major release transforms LocalAI into a complete AI orchestration platform. We’ve embedded agentic and hybrid search capabilities directly into the core, completely overhauled the user interface with React for a modern experience, and are thrilled to introduce Agenthub ( link ) a brand new community hub to easily share and import agents. Alongside these massive updates, we've introduced powerful new features like Canvas mode for code artifacts, MCP apps and full MCP client-side support. Feature Summary Agentic Orchestration & Agenthub Native agent management with memory, skills, and the new Agenthub for community sharing. Revamped React UI Complete frontend rewrite for lightning-fast performance and modern UX. Canvas Mode Preview code blocks and artifacts side-by-side in the chat interface. MCP Client-Side Full Model Context Protocol support, MCP Apps, and tool streaming in chat. WebRTC Realtime WebRTC support for low-latency realtime audio conversations. New Backends Added experimental MLX Distributed , fish-speech, ace-step.cpp, and faster-qwen3-tts. Infrastructure Podman documentation, shell completion, and persistent data path separation. 🚀 Key Features 🤖 Native Agentic Orchestration & Agenthub LocalAI now includes agentic capabilities embedded directly in the core. You can manage, import, start, and stop agents via the new UI. 🌐 Agenthub: We are launching Agenthub ! This is a centralized community space to share common agents and import them effortlessly into your LocalAI instance. Agent Management: Full lifecycle management via the React UI. Create Agents, connect them to Slack, configure MCP servers and skills. Skills Management: Centralized skill database for AI agents. Memory: Agents can utilize memory with Hybrid search (PostgreSQL) or embedded in-memory storage (Chromem). Observability: New "Events" column in the Agents list to track observables and status. 📚 Documentation: Dive into the new capabiliti
14 Mar 2026
HighCapability
llama.cpp v3.12.1 released — fixes Qwen 3 coder incompatibility
This is a patch release to tag the new llama.cpp version which fixes incompatibilities with Qwen 3 coder. What's Changed Other Changes docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #8611 feat(traces): Add backend traces by @richiejp in #8609 chore: ⬆️ Update ggml-org/llama.cpp to b908baf1825b1a89afef87b09e22c32af2ca6548 by @localai-bot in #8612 chore: drop bark.cpp leftovers from pipelines by @mudler in #8614 fix: merge openresponses messages by @mudler in #8615 chore: ⬆️ Update ggml-org/llama.cpp to ba3b9c8844aca35ecb40d31886686326f22d2214 by @localai-bot in #8613 Full Changelog : v3.12.0...v3.12.1
21 Feb 2026
MediumCapability
llama.cpp v3.12.1 released — fixes incompatibility with Qwen 3 coder
This is a patch release to tag the new llama.cpp version which fixes incompatibilities with Qwen 3 coder. What's Changed Other Changes docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #8611 feat(traces): Add backend traces by @richiejp in #8609 chore: ⬆️ Update ggml-org/llama.cpp to b908baf1825b1a89afef87b09e22c32af2ca6548 by @localai-bot in #8612 chore: drop bark.cpp leftovers from pipelines by @mudler in #8614 fix: merge openresponses messages by @mudler in #8615 chore: ⬆️ Update ggml-org/llama.cpp to ba3b9c8844aca35ecb40d31886686326f22d2214 by @localai-bot in #8613 Full Changelog : v3.12.0...v3.12.1
21 Feb 2026
MediumCapability
LocalAI 3.12.0 Release: Multi-modal Realtime & Voxtral Backend
🎉 LocalAI 3.12.0 Release! 🚀 LocalAI 3.12.0 is out! Feature Summary Multi-modal Realtime Send text, images, and audio in real-time conversations for richer interactions. Voxtral Backend New high-quality text-to-speech backend added. Multi-GPU Support Improved Diffusers performance with multiple GPUs. Legacy CPU Optimization Enhanced compatibility for older processors. UI Theme & Layout Improved UI theme (dark/light variants) and navigation Realtime Stability Multiple fixes for audio, image, and model handling. Logging Improvements Reduced excessive logs and optimized processing. Local Stack Family Liking LocalAI? LocalAI is part of an integrated suite of AI infrastructure tools, you might also like: LocalAGI - AI agent orchestration platform with OpenAI Responses API compatibility and advanced agentic capabilities LocalRecall - MCP/REST API knowledge base system providing persistent memory and storage for AI agents 🆕 Cogito - Go library for building intelligent, co-operative agentic software and LLM-powered workflows, focusing on improving results for small, open source language models that scales to any LLM. Powers LocalAGI and LocalAI MCP/Agentic capabilities 🆕 Wiz - Terminal-based AI agent accessible via Ctrl+Space keybinding. Portable, local-LLM friendly shell assistant with TUI/CLI modes, tool execution with approval, MCP protocol support, and multi-shell compatibility (zsh, bash, fish) 🆕 SkillServer - Simple, centralized skills database for AI agents via MCP. Manages skills as Markdown files with MCP server integration, web UI for editing, Git synchronization, and full-text search capabilities ❤️ Thank You LocalAI is a true FOSS movement — built by contributors, powered by community. If you believe in privacy-first AI: ✅ Star the repo 💬 Contribute code, docs, or feedback 📣 Share with others Your support keeps this stack alive. ✅ Full Changelog 📋 Click to expand full changelog What's Changed Bug fixes 🐛 security: validate URLs to prevent SSRF in content fetching
20 Feb 2026
HighCapability
LocalAI 3.12.0 Release: Multi-modal Realtime, Voxtral Backend, GPU Improvements
🎉 LocalAI 3.12.0 Release! 🚀 LocalAI 3.12.0 is out! Feature Summary Multi-modal Realtime Send text, images, and audio in real-time conversations for richer interactions. Voxtral Backend New high-quality text-to-speech backend added. Multi-GPU Support Improved Diffusers performance with multiple GPUs. Legacy CPU Optimization Enhanced compatibility for older processors. UI Theme & Layout Improved UI theme (dark/light variants) and navigation Realtime Stability Multiple fixes for audio, image, and model handling. Logging Improvements Reduced excessive logs and optimized processing. Local Stack Family Liking LocalAI? LocalAI is part of an integrated suite of AI infrastructure tools, you might also like: LocalAGI - AI agent orchestration platform with OpenAI Responses API compatibility and advanced agentic capabilities LocalRecall - MCP/REST API knowledge base system providing persistent memory and storage for AI agents 🆕 Cogito - Go library for building intelligent, co-operative agentic software and LLM-powered workflows, focusing on improving results for small, open source language models that scales to any LLM. Powers LocalAGI and LocalAI MCP/Agentic capabilities 🆕 Wiz - Terminal-based AI agent accessible via Ctrl+Space keybinding. Portable, local-LLM friendly shell assistant with TUI/CLI modes, tool execution with approval, MCP protocol support, and multi-shell compatibility (zsh, bash, fish) 🆕 SkillServer - Simple, centralized skills database for AI agents via MCP. Manages skills as Markdown files with MCP server integration, web UI for editing, Git synchronization, and full-text search capabilities ❤️ Thank You LocalAI is a true FOSS movement — built by contributors, powered by community. If you believe in privacy-first AI: ✅ Star the repo 💬 Contribute code, docs, or feedback 📣 Share with others Your support keeps this stack alive. ✅ Full Changelog 📋 Click to expand full changelog What's Changed Bug fixes 🐛 security: validate URLs to prevent SSRF in content fetching
20 Feb 2026
HighCapability
LocalAI 3.11.0 Release: Realtime Audio, Music Generation, and Expanded ASR
🎉 LocalAI 3.11.0 Release! 🚀 LocalAI 3.11.0 is a massive update for Audio and Multimodal capabilities . We are introducing Realtime Audio Conversations , a dedicated Music Generation UI , and a massive expansion of ASR (Speech-to-Text) and TTS backends. Whether you want to talk to your AI, clone voices, transcribe with speaker identification, or generate songs, this release has you covered. Check out the highlights below! 📌 TL;DR Feature Summary Realtime Audio Native support for audio conversations , enabling fluid voice interactions similar to OpenAI's Realtime API. Documentation Music Generation UI New UI interface for MusicGen (Ace-Step), allowing you to generate music from text prompts directly in the browser. New ASR Backends Added WhisperX (with Speaker Diarization), VibeVoice , Qwen-ASR , and Nvidia NeMo . TTS Streaming Text-to-Speech now supports streaming mode for lower latency responses. (VoxCPM only for now) vLLM Omni Added support for vLLM Omni , expanding our high-performance inference capabilities. Speaker Diarization Native support for identifying different speakers in transcriptions via WhisperX . Hardware Expansion Expanded build support for CUDA 12/13, L4T (Jetson), SBSA, and better Metal (Apple Silicon) integration with MLX backends Breaking Changes ExLlama (deprecated) and Bark (unmaintained) backends have been removed. 🚀 New Features & Major Enhancements 🎙️ Realtime Audio Conversations LocalAI 3.11.0 introduces native support for Realtime Audio Conversations . Enables fluid, low-latency voice interaction with agents. Logic handled directly within the LocalAI pipeline for seamless audio-in/audio-out workflows. Support for STT/TTS and voice-to-voice models (experimental) Support for tool calls 🗣️ Talk to your LocalAI : This brings us one step closer to a fully local, voice-native assistant experience compatible with standard client implementations. Check here for detailed documentation. 🎵 Music Generation UI & Ace-Step We have added a dedicated int
7 Feb 2026
HighCapability
LocalAI releases v3.10.1 with Qwen-TTS and Qwen3-TTS support
This is a small patch release intended to provide bugfixes and minor polishment, along, we also added support to Qwen-TTS that was just released yesterday. Fix reasoning detection on reasoning and instruct models Support reasoning blocks with openresponses API fixes to correctly run LTX-2 Support Qwen3-TTS! What's Changed Bug fixes 🐛 fix(reasoning): support models with reasoning without starting thinking tag by @mudler in #8132 fix(tracing): Create trace buffer on first request to enable tracing at runtime by @richiejp in #8148 fix(videogen): drop incomplete endpoint, add GGUF support for LTX-2 by @mudler in #8160 Exciting New Features 🎉 feat(openresponses): Support reasoning blocks by @mudler in #8133 feat: detect thinking support from backend automatically if not explicitly set by @mudler in #8167 feat(qwen-tts): add Qwen-tts backend by @mudler in #8163 🧠 Models chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8128 chore(model gallery): add flux 2 and flux 2 klein by @mudler in #8141 chore(model-gallery): ⬆️ update checksum by @localai-bot in #8153 chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8157 chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8170 👒 Dependencies chore(deps): bump github.com/mudler/cogito from 0.7.2 to 0.8.1 by @dependabot [bot] in #8124 Other Changes feat(swagger): update swagger by @localai-bot in #8098 chore: ⬆️ Update ggml-org/llama.cpp to 287a33017b32600bfc0e81feeb0ad6e81e0dd484 by @localai-bot in #8100 chore: ⬆️ Update leejet/stable-diffusion.cpp to 2efd19978dd4164e387bf226025c9666b6ef35e2 by @localai-bot in #8099 docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #8120 chore: ⬆️ Update leejet/stable-diffusion.cpp to a48b4a3ade9972faf0adcad47e51c6fc03f0e46d by @localai-bot in #8121 chore: ⬆️ Update ggml-org/llama.cpp to 959ecf7f234dc0bc0cd6829b25cb0ee1481aa78a by @localai-bot in #8122 chore(deps): Bump llama.cpp to '1c7cf94b22a9dc6b1d3
23 Jan 2026
HighCapability
LocalAI 3.10.0 Release: Agent Capabilities & Multi-Modal Support
🎉 LocalAI 3.10.0 Release! 🚀 LocalAI 3.10.0 is big on agent capabilities, multi-modal support, and cross-platform reliability . We've added native Anthropic API support , launched a new Video Generation UI , introduced Open Responses API compatibility , and enhanced performance with a unified GPU backend system . For a full tour, see below! 📌 TL;DR Feature Summary Anthropic API Support Fully compatible /v1/messages endpoint for seamless drop-in replacement of Claude. Open Responses API Native support for stateful agents with tool calling, streaming, background mode, and multi-turn conversations, passing all official acceptance tests . Video & Image Generation Suite New video gen UI + LTX-2 support for text-to-video and image-to-video. Unified GPU Backends GPU libraries (CUDA, ROCm, Vulkan) packaged inside backend containers — works out of the box on Nvidia, AMD, and ARM64 (Experimental). Tool Streaming & XML Parsing Full support for streaming tool calls and XML-formatted tool outputs. System-Aware Backend Gallery Only see backends your system can run (e.g., hide MLX on Linux). Crash Fixes Prevents crashes on AVX-only CPUs (Intel Sandy/Ivy Bridge) and fixes VRAM reporting on AMD GPUs. Request Tracing Debug agents & fine-tuning with memory-based request/response logging. Moonshine Backend Ultra-fast transcription engine for low-end devices. Pocket-TTS Lightweight, high-fidelity text-to-speech with voice cloning. Vulkan arm64 builds We now build backends and images for vulkan on arm64 as well 🚀 New Features & Major Enhancements 🤖 Open Responses API: Build Smarter, Autonomous Agents LocalAI now supports the OpenAI Responses API , enabling powerful agentic workflows locally. Stateful conversations via response_id — resume and manage long-running agent sessions. Background mode : Run agents asynchronously and fetch results later. Streaming support for tools, images, and audio. Built-in tools : Web search, file search, and computer use (via MCP integrations). Multi-turn int
18 Jan 2026
HighCapability

Get alerts for LocalAI

Never miss a breaking change. SignalBreak monitors LocalAI and dozens of other AI providers in real time.