NuminaMath 7B TIR wins 1st AIMO Progress Prize with tool-integrated reasoning
AI Impact Summary
NuminaMath 7B TIR is presented as a reasoning agent that employs tool-integrated reasoning with Python execution feedback (SC-TIR) to solve math problems. The solution was developed via a two-stage fine-tuning recipe on DeepSeekMath-Base 7B, using CoT-style templates and code execution, trained with TRL, PyTorch, vLLM, and DeepSpeed, with validation against private leaderboards; the private test achieved 29/50. This demonstrates that open-weight models can attain meaningful math reasoning performance when augmented with tools and carefully designed decoding. The collaboration with Hugging Face signals a scalable blueprint for open AI4Maths efforts and potential commercialization avenues around math-enabled LLMs.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info