TRL Preference Optimization for Idefics2-8b VLMs — Quantization & LoRA
AI Impact Summary
This document details the process of training a Vision Language Model (VLM) using preference optimization with the TRL library and the Idefics2-8b model. The core technique involves creating a dataset of image-question-answer triplets, formatted to simulate a conversation, and then training the model to predict the chosen answer over the rejected one. The document highlights the use of quantization and LoRA to mitigate memory constraints, specifically addressing the high VRAM requirements of the Idefics2-8b model, and provides a detailed calculation of the memory needed for training.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info