Alyah Emirati Dialect Benchmark Evaluates Arabic LLMs' Emirati Dialect Capabilities
AI Impact Summary
Alyah introduces a dedicated Emirati dialect benchmark to fill the gap where Arabic LLM evaluation centers on Modern Standard Arabic. The dataset comprises 1,173 manually curated, four-option questions across categories like greetings, poetry, and cultural references, used to compare 54 models (23 base, 31 instruction-tuned). Early results show instruction-tuned models generally outperform base models across most categories, but substantial gaps remain in dialect-specific phenomena and culturally grounded reasoning, indicating risk for deployments that rely on generic Arabic models in Emirati contexts. This benchmark provides a concrete, reproducible metric to measure and drive improvement in dialectal understanding over time.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info