Hugging Face: DeepMath: Lightweight math reasoning agent using Qwen3-4B Thinking and GRPO | SignalBreak | SignalBreak