InfoCapability

Hallucinations Leaderboard: Open LLM Evaluation Benchmark

AI Impact Summary

The Hallucinations Leaderboard project provides a crucial, open-source effort to quantify and track hallucination rates across a diverse set of large language models. By leveraging the EleutherAI Language Model Evaluation Harness and conducting experiments on high-performance computing infrastructure, the project establishes a standardized benchmark for evaluating LLM reliability. This leaderboard is critical for developers and researchers seeking to mitigate the risks associated with inaccurate or misleading information generated by these models.

Affected Systems

EleutherAI Language Model Evaluation HarnessHugging Face Leaderboard Template

Date: Date not specified
Change type: capability
Severity: info

Hallucinations Leaderboard: Open LLM Evaluation Benchmark

More from Hugging Face

Get alerts for Hugging Face