Multilingual Visual Document Retrieval with vdr-2b-multi-v1 and vdr-multilingual-train
AI Impact Summary
LlamaIndex in collaboration introduces vdr-2b-multi-v1, a multilingual embedding model for visual document retrieval that encodes document page screenshots into dense vectors, enabling cross-language search without OCR. It leverages a 500k-sample multilingual training dataset (vdr-multilingual-train) and delivers cross-lingual capabilities (e.g., German queries for Italian documents) with faster inference and reduced VRAM compared to the English-only baseline. Open-sourced on Hugging Face Space and integrated with SentenceTransformers and LlamaIndex, this enables multilingual VDR workflows but requires updating embedding pipelines to adopt the new model and dataset for full multilingual coverage.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info