DeepMath: Lightweight math reasoning agent using Qwen3-4B Thinking and GRPO
AI Impact Summary
DeepMath combines a small model (Qwen3-4B Thinking) with a constrained Python executor to emit and run short code traces, reducing arithmetic errors and trace verbosity. The approach is implemented with the smolagents framework and uses vLLM for inference, trained with GRPO under the TRL stack. It is evaluated on MATH500, AIME, HMMT, and HLE, showing up to 66% shorter outputs and improved accuracy, which can translate to lower inference latency and cost for math-heavy applications while improving interpretability and safety through sandboxed code execution.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info