Liger GRPO integration with TRL encounters shape mismatch during Qwen2.5-0.5B-Instruct training
AI Impact Summary
An effort to run Liger GRPO with TRL against Qwen2.5-0.5B-Instruct on DeepSpeed ZeRO-3 bf16 triggers a runtime shape mismatch in the fused PPO loss path. The traceback walks through grpo_loss.forward -> LigerFusedLinearGRPOFunction -> fused_linear_ppo.py accumulate_chunk, indicating expected tensor shapes don't align with current inputs, batch layout, or model dimensions. This reveals a compatibility gap between Liger GRPO's fused kernels and the Qwen 2.5B-Instruct deployment under ZeRO-3, risking training pauses until a fix is applied. Engineers should verify hidden sizes, sequence lengths, and chunking compatibility between Qwen, TRL, and the Liger GRPO fused implementation.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info