16 Open-Source RL Libraries: Lessons in Asynchronous Training
AI Impact Summary
Open-source RL libraries are increasingly adopting an asynchronous training architecture to overcome bottlenecks in synchronous RL training, particularly with long rollouts from large language models. This approach, involving separate inference and training GPU pools connected via a rollout buffer and asynchronous weight synchronization, allows for concurrent execution and significantly reduces GPU idle time. The survey highlights Ray as a dominant orchestration primitive and NCCL broadcast for weight synchronization, while staleness management and LoRA support remain key considerations for future development.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info