Arabic-Leaderboards update: AraGen-03-25, Arabic Instruction Following, and MBZUAI-backed Arabic-Leaderboards Space
AI Impact Summary
MBZUAI-backed Arabic-Leaderboards Space now consolidates Arabic evaluations, hosting AraGen-03-25 and Arabic Instruction Following, with plans to add more modalities. The AraGen-03-25 release expands the dataset to 340 QA/Reasoning/Safety/Orthographic pairs and refines the judge system prompt; dynamic evaluation and ranking analyses show overall stability of top models but notable shifts among Claude variants and gpt-4o when prompts and datasets change. This expanded, more challenging benchmark surface will influence how teams benchmark Arabic models and compare deployment candidates, and the space invites external contributions to add new leaderboards.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info