Open LLM Leaderboard: CO₂ emissions insights across community vs official model fine-tunes
AI Impact Summary
Open LLM Leaderboard now reports CO₂ emissions per inference using a fixed hardware and workflow (8 GPUs/node, Transformers with Accelerate), enabling apples-to-apples comparisons across 2,742 models including Gemma/Gemma2, Llama, Mistral, Mixtral, Phi/Phi3, and Qwen families. The data confirms that larger base models incur higher emissions, but the rank-to-emission relationship is not strictly proportional; notably, official fine-tunes often consume more energy than base models, while community fine-tunes can achieve similar scores with substantially lower CO₂ in several cases. This implies that production teams can improve sustainability by favoring community-tuned or smaller models when acceptable performance is achieved, and by benchmarking energy per task on their own hardware since CO₂ estimates are hardware-specific.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info