Alyah Emirati Dialect Benchmark evaluates Arabic LLMs’ Emirati language and culture understanding
AI Impact Summary
Alyah is a Emirati-centric benchmark designed to evaluate how well Arabic LLMs capture linguistic, cultural, and pragmatic aspects of the Emirati dialect. It comprises 1,173 manually curated multiple-choice samples spanning greetings, poetry, heritage, and dialect-specific nuances, evaluated across 54 models (both base and instruction-tuned) to assess dialectal proficiency beyond Modern Standard Arabic. Early results indicate instruction-tuned models outperform base variants on dialect tasks, underscoring the need for dialect-focused fine-tuning and evaluation in production deployments. This benchmark provides concrete guidance for selecting and tuning Arabic LLMs for Emirati-language use cases and highlights where models may systematically struggle with culturally grounded expressions.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info