InfoCapability

Vision Language Model Alignment in TRL — MPO, GRPO, GSPO support for VLMs

AI Impact Summary

TRL adds MPO, GRPO, and GSPO for Vision Language Model alignment, enabling richer preference signals and better scaling with large VLMs. It extends DPO-based workflows with a combined loss (sigmoid, bco, sft) and introduces GRPO/GSPO training modes, including RLOO and Online DPO support, with accompanying training notebooks. This reduces reliance on simple pairwise preferences and supports models such as IDEFICS2 and Qwen2.5VL-3B in TRL pipelines, potentially delivering higher-quality multimodal outputs. Teams should anticipate updated APIs (DPOConfig, DPOTrainer, GRPOConfig, GRPOTrainer) and plan for revalidating prompts and data pipelines to accommodate new loss components and group-based updates.

Affected Systems

TRLIDEFICS2

Date: Date not specified
Change type: capability
Severity: info

Vision Language Model Alignment in TRL — MPO, GRPO, GSPO support for VLMs

More from Hugging Face

Get alerts for Hugging Face