Hugging Face: Vision Language Model Alignment in TRL — MPO, GRPO, GSPO support for VLMs | SignalBreak | SignalBreak