Hugging Face: Recreate Deepseek R1 "aha moment" with GRPO on Qwen2.5-3B | SignalBreak | SignalBreak