InfoCapability

Co-located vLLM in TRL enables shared-GPU training and inference for GRPO

AI Impact Summary

TRL now supports running vLLM inside the same distributed process as the trainer, enabling GPUs to be shared between training and generation. This eliminates the previous server-mode HTTP boundary, removing idle GPU time and the need for separate GPUs dedicated to inference. This optimization is especially impactful for GRPO online-learning workloads where generation happens continuously, boosting throughput while reducing hardware costs. Adoption requires configuring vllm_mode to colocate in GRPOConfig and tuning vLLM memory utilization; the integration uses the external_launcher and remains compatible with torchrun, tensor/data parallelism, and SPMD execution.

Affected Systems

vLLMTRL

Date: Date not specified
Change type: capability
Severity: info

Co-located vLLM in TRL enables shared-GPU training and inference for GRPO

More from Hugging Face

Get alerts for Hugging Face