Open ASR Leaderboard: Multilingual & Long-Form Tracks Highlight Conformer+LLM Leaders and Throughput Tradeoffs
AI Impact Summary
Open ASR Leaderboard now emphasizes multilingual and long-form transcription, highlighting 60+ open and closed models across 18 organizations and 11 datasets. The trends show Conformer encoders paired with large language model decoders delivering the best English WER, but with slower inference, while CTC/TDT decoders provide 10–100x higher throughput suitable for real-time or batch tasks. Multilingual results reveal a tradeoff: expanding language coverage can reduce single-language accuracy, and long-form performance remains dominated by closed-source systems, underscoring licensing, optimization, and deployment considerations. Expect ongoing evolution as open models like Parakeet, Voxtral, and Whisper variants push efficiency and new language datasets expand benchmarks.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info