Finetuning Sparse Embedding Models with Sentence Transformers
AI Impact Summary
This document details the finetuning of sparse embedding models, specifically using Sentence Transformers, to achieve improved semantic search and retrieval capabilities. The core concept revolves around query/document expansion, allowing models like SPLADE to match semantically similar texts even with differing vocabulary. Finetuning is crucial to address the model’s tendency to over-expand, focusing the model’s knowledge on a specific domain or language, as demonstrated by the example of the ‘cephalalgia’ case. This process leverages components like the model, datasets, loss functions, and trainers to create efficient, interpretable sparse embedding models.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info