Vectara launches HHEM leaderboard using Hugging Face leaderboard template for hallucination evaluation
AI Impact Summary
The article describes an end-to-end process for building an open-source HHEM leaderboard using the Hugging Face leaderboard template, including custom backend models and datasets to track hallucination metrics. It emphasizes dynamic updates, model submissions, and deployment as a Hugging Face Space, illustrating how teams can democratize evaluation across both open-source and commercial models (e.g., Llama 2, Mistral 7B, GPT-4, Gemini, Claude). The content highlights code locations and workflow steps that a technical team would need to implement and maintain, signaling a reusable blueprint for governance-focused model evaluation. This approach enables repeatable benchmarking of hallucination propensity, supporting procurement decisions and transparency in model performance.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info