Introducing LiveCodeBench Leaderboard - Contamination-Free Code LLM Evaluation
AI Impact Summary
The LiveCodeBench leaderboard introduces a new benchmark for evaluating code LLMs, addressing the critical issue of contamination found in existing benchmarks like HumanEval. This new approach uses a continuously updated dataset of coding problems from LeetCode, AtCoder, and CodeForces, annotated with release dates to prevent overfitting and provide a more realistic assessment of model capabilities. The leaderboard incorporates holistic evaluation across code generation, self-repair, test output prediction, and code execution, offering a more comprehensive understanding of LLM performance.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info