Enable finetuning of sparse embedding models with Sentence Transformers (SPLADE) using naver/splade-v3
AI Impact Summary
This change describes a workflow to fine-tune sparse embedding models (SPLADE-style) with Sentence Transformers to adapt encoders to domain data for retrieval and hybrid search. It demonstrates using SparseEncoder with the naver/splade-v3 model and sourcing pretrained encoders from the Hugging Face Hub, along with standard training components (datasets, losses, evaluators, trainer). The approach preserves interpretability by decoding top contributing tokens and shows how neural sparse expansion influences matching. Domain teams can achieve improved retrieval accuracy and clearer token-level explanations, but will need training data, compute, and governance to manage vocabulary expansion and model drift.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info