TRL IPO fix: average log-likelihood loss aligns IPO with DPO in alignment experiments
AI Impact Summary
TRL's IPO implementation contained an error where the log-likelihood loss was summed rather than averaged, biasing optimization. With the corrected averaging, experiments on OpenHermes-2.5-Mistral-7B and Zephyr-7b-beta-sft using orca_dpo_pairs and ultrafeedback-binarized datasets show IPO now matches DPO and outperforms KTO in paired preferences, aligning with the published results. This clarifies the relative strengths of DPO/IPO in non-RL alignment and underscores the importance of using the corrected TRL loss formulation when benchmarking. Past comparisons that used the buggy loss may have misrepresented IPO's effectiveness.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info