InfoCapability

TRL Preference Optimization for Idefics2-8b VLMs — Quantization & LoRA

AI Impact Summary

This document details the process of training a Vision Language Model (VLM) using preference optimization with the TRL library and the Idefics2-8b model. The core technique involves creating a dataset of image-question-answer triplets, formatted to simulate a conversation, and then training the model to predict the chosen answer over the rejected one. The document highlights the use of quantization and LoRA to mitigate memory constraints, specifically addressing the high VRAM requirements of the Idefics2-8b model, and provides a detailed calculation of the memory needed for training.

Affected Systems

TRLIdefics2-8b

Date: Date not specified
Change type: capability
Severity: info

TRL Preference Optimization for Idefics2-8b VLMs — Quantization & LoRA

More from Hugging Face

Get alerts for Hugging Face