Hugging Face: TRL adds Direct Preference Optimization (DPO) for Vision-Language Models with LoRA and quantization | SignalBreak | SignalBreak