InfoCapability

BenCzechMark: Comprehensive Czech-language LLM evaluation suite and leaderboard

AI Impact Summary

BenCzechMark provides a 50-task, 9-category Czech-language evaluation suite that spans reading comprehension, NER, factual knowledge, sentiment, and math reasoning. It uses multiple metrics (Acc, EM, AUROC, Ppl) and a model-duel framework (DWS) to produce a cross-model ranking. The leaderboard highlights open-source models like Llama-450B and Aya-23-35B, offering concrete data on Czech-language capabilities and transfer gaps. This enables technical teams to identify where Czech understanding and domain knowledge are strong or weak, informing model selection and calibration for Czech-language products.

Affected Systems

BenCzechMarkLlama-450B

Date: Date not specified
Change type: capability
Severity: info

BenCzechMark: Comprehensive Czech-language LLM evaluation suite and leaderboard

More from Hugging Face

Get alerts for Hugging Face