InfoCapability

NuminaMath 7B TIR wins 1st AIMO Progress Prize using tool-integrated reasoning and Python execution

AI Impact Summary

NuminaMath and Hugging Face demonstrated that open-weight LLMs can achieve competitive math problem-solving performance when paired with tool-enabled reasoning and Python execution. The winning approach centers on NuminaMath 7B TIR, fine-tuned as a reasoning agent with a novel tool-integrated decoding loop and Python-repl feedback, guided by a two-stage training recipe inspired by MuMath-Code and ToRA. They leveraged full fine-tuning (not LoRA) with large-scale infrastructure (8x H100s, TRL, vLLM, DeepSpeed ZeRO-3) to fit and train the model, achieving 29/50 on the private test set, which validates a repeatable, open-model path for complex mathematical reasoning tasks.

Affected Systems

NuminaMath 7B TIRDeepSeekMath-Base 7B

Date: Date not specified
Change type: capability
Severity: info

NuminaMath 7B TIR wins 1st AIMO Progress Prize using tool-integrated reasoning and Python execution

More from Hugging Face

Get alerts for Hugging Face