Sentence Transformers v5.4: Multimodal Embedding & Reranking with Qwen3-VL
AI Impact Summary
Sentence Transformers v5.4 introduces multimodal embedding and reranking models leveraging the Qwen3-VL-2B and Qwen3-VL-Reranker-2B models. These models enable encoding and comparison of text, images, audio, and video, opening possibilities for applications like retrieval augmented generation and cross-modal search. The key technical change is the ability to use the same API for multimodal inputs, simplifying workflows and potentially reducing the complexity of building RAG pipelines.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info