FilBench evaluates LLMs for Filipino languages; SEA-LION/SeaLLM efficient, GPT-4o strongest
AI Impact Summary
FilBench provides a structured benchmark for Tagalog, Filipino, and Cebuano, evaluating models across Cultural Knowledge, Classical NLP, Reading Comprehension, and Generation. The results show SEA-specific open-weight models (SEA-LION, SeaLLM) offer competitive efficiency for Filipino tasks, though GPT-4o generally remains stronger on generation and translation. Translation and generation robustness remain challenging, with open models sometimes producing verbose outputs or errors in language identity. For cost-sensitive environments, open-weight models can be viable if validated with FilBench prior to deployment.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info