Hugging Face: TRL expands Vision Language Model alignment with MPO, GRPO, GSPO and Online DPO | SignalBreak | SignalBreak