mmBERT: ModernBERT goes Multilingual — 1800+ languages
AI Impact Summary
OpenAI is releasing mmBERT, a massively multilingual encoder model trained on a diverse dataset of over 1800 languages, representing a significant advancement over previous models like XLM-R. The innovative training approach, including progressive language addition and an annealed language learning schedule, enables mmBERT to achieve competitive performance across a wide range of languages, even incorporating low-resource languages during the final training phase. This represents a key shift in multilingual model development, demonstrating the potential for efficient learning with limited data.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info