InfoCapability

Finetuning Sparse Embedding Models with Sentence Transformers

AI Impact Summary

This document details the finetuning of sparse embedding models, specifically using Sentence Transformers, to achieve improved semantic search and retrieval capabilities. The core concept revolves around query/document expansion, allowing models like SPLADE to match semantically similar texts even with differing vocabulary. Finetuning is crucial to address the model’s tendency to over-expand, focusing the model’s knowledge on a specific domain or language, as demonstrated by the example of the ‘cephalalgia’ case. This process leverages components like the model, datasets, loss functions, and trainers to create efficient, interpretable sparse embedding models.

Affected Systems

Sentence TransformersSPLADE

Date: Date not specified
Change type: capability
Severity: info

Finetuning Sparse Embedding Models with Sentence Transformers

More from Hugging Face

Get alerts for Hugging Face