NuminaMath 7B TIR wins 1st AIMO Progress Prize using tool-integrated reasoning and Python execution
AI Impact Summary
NuminaMath and Hugging Face demonstrated that open-weight LLMs can achieve competitive math problem-solving performance when paired with tool-enabled reasoning and Python execution. The winning approach centers on NuminaMath 7B TIR, fine-tuned as a reasoning agent with a novel tool-integrated decoding loop and Python-repl feedback, guided by a two-stage training recipe inspired by MuMath-Code and ToRA. They leveraged full fine-tuning (not LoRA) with large-scale infrastructure (8x H100s, TRL, vLLM, DeepSpeed ZeRO-3) to fit and train the model, achieving 29/50 on the private test set, which validates a repeatable, open-model path for complex mathematical reasoning tasks.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info