Wav2Vec2 with LM decoding in Transformers using kenLM and pyctcdecode (patrickvonplaten/wav2vec2-base-100h-with-lm)
AI Impact Summary
Hugging Face Transformers now supports decoding Wav2Vec2 outputs using a language model via Wav2Vec2ProcessorWithLM and an external 4-gram LM. The setup relies on KenLM and KenLM's pyctcdecode integration; it changes the decoding path from argmax to LM-enabled beam search, enabling corrections of common mis-transcriptions (e.g., misspellings) in LibriSpeech-style data. This capability is demonstrated with the patrickvonplaten/wav2vec2-base-100h-with-lm model, which ships a prebuilt LM. Teams can improve ASR accuracy in low-data scenarios, but must install kenlm and pyctcdecode, and adapt their inference to feed logits to the LM-based decoder.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info