InfoCapability

TRL IPO fix: average log-likelihood loss aligns IPO with DPO in alignment experiments

AI Impact Summary

TRL's IPO implementation contained an error where the log-likelihood loss was summed rather than averaged, biasing optimization. With the corrected averaging, experiments on OpenHermes-2.5-Mistral-7B and Zephyr-7b-beta-sft using orca_dpo_pairs and ultrafeedback-binarized datasets show IPO now matches DPO and outperforms KTO in paired preferences, aligning with the published results. This clarifies the relative strengths of DPO/IPO in non-RL alignment and underscores the importance of using the corrected TRL loss formulation when benchmarking. Past comparisons that used the buggy loss may have misrepresented IPO's effectiveness.

Affected Systems

TRL (HuggingFace)Direct Preference Optimization (DPO)

Date: Date not specified
Change type: capability
Severity: info

TRL IPO fix: average log-likelihood loss aligns IPO with DPO in alignment experiments

More from Hugging Face

Get alerts for Hugging Face