Hugging Face: TRL enables co-located vLLM in GRPO training for unified GPU usage | SignalBreak | SignalBreak