InfoCapability

Open ASR Leaderboard: Multilingual & Long-Form Tracks Highlight Conformer+LLM Leaders and Throughput Tradeoffs

AI Impact Summary

Open ASR Leaderboard now emphasizes multilingual and long-form transcription, highlighting 60+ open and closed models across 18 organizations and 11 datasets. The trends show Conformer encoders paired with large language model decoders delivering the best English WER, but with slower inference, while CTC/TDT decoders provide 10–100x higher throughput suitable for real-time or batch tasks. Multilingual results reveal a tradeoff: expanding language coverage can reduce single-language accuracy, and long-form performance remains dominated by closed-source systems, underscoring licensing, optimization, and deployment considerations. Expect ongoing evolution as open models like Parakeet, Voxtral, and Whisper variants push efficiency and new language datasets expand benchmarks.

Affected Systems

Open ASR Leaderboard

Date: Date not specified
Change type: capability
Severity: info

Open ASR Leaderboard: Multilingual & Long-Form Tracks Highlight Conformer+LLM Leaders and Throughput Tradeoffs

More from Hugging Face

Get alerts for Hugging Face