Liger GRPO integration with TRL hits shape mismatch during Qwen2.5-0.5B-Instruct training (DeepSpeed ZeRO-3)
AI Impact Summary
A capability push to enable Liger GRPO loss within TRL workflows is encountering a runtime shape mismatch during training with Qwen2.5-0.5B-Instruct under DeepSpeed ZeRO-3 bf16. The stack trace implicates the fused Linear GRPO path (grpo_loss.py via LigerFusedLinearGRPOFunction) and Torch Dynamo, indicating tensor shape alignment between inputs, weights, and loss computation is not met in this configuration. This blocks validation of the Liger GRPO with TRL in large instruction-tuned models until kernel/fusion code is updated or a compatible training path is provided. Without a fix, teams cannot rely on this integration for experimentation or adoption, delaying feature rollout and potential performance gains.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info