InfoCapability

LiveCodeBench Leaderboard: Contamination-free, time-based evaluation of Code LLMs across four scenarios

AI Impact Summary

LiveCodeBench introduces a holistic, contamination-free benchmark for code LLMs, using problem release dates from LeetCode, AtCoder, and CodeForces to enable evaluation over time and detect data leakage. It assesses four coding scenarios—Code Generation, Self Repair, Code Execution, and Test Output Prediction—with an execution-based correctness metric (Pass@1), providing a more robust view of real-world coding capabilities than standard benchmarks. This matters to technical teams because it offers a consistent, time-aware scoring framework and actionable insights across models (e.g., GPT-4-Turbo, Claude-3-Opus, Mistral-Large) and evaluation tooling (LiveCodeBench repository, lcb_runner) to guide model selection and benchmarking strategies.

Affected Systems

LiveCodeBenchLeetCode

Date: Date not specified
Change type: capability
Severity: info

LiveCodeBench Leaderboard: Contamination-free, time-based evaluation of Code LLMs across four scenarios

More from Hugging Face

Get alerts for Hugging Face