InfoCapability

DeepMath: Lightweight math reasoning agent using Qwen3-4B Thinking and GRPO

AI Impact Summary

DeepMath combines a small model (Qwen3-4B Thinking) with a constrained Python executor to emit and run short code traces, reducing arithmetic errors and trace verbosity. The approach is implemented with the smolagents framework and uses vLLM for inference, trained with GRPO under the TRL stack. It is evaluated on MATH500, AIME, HMMT, and HLE, showing up to 66% shorter outputs and improved accuracy, which can translate to lower inference latency and cost for math-heavy applications while improving interpretability and safety through sandboxed code execution.

Affected Systems

Qwen3-4B Thinkingsmolagents

Date: Date not specified
Change type: capability
Severity: info

DeepMath: Lightweight math reasoning agent using Qwen3-4B Thinking and GRPO

More from Hugging Face

Get alerts for Hugging Face