InfoCapability

Wav2Vec2 with LM decoding in Transformers using kenLM and pyctcdecode (patrickvonplaten/wav2vec2-base-100h-with-lm)

AI Impact Summary

Hugging Face Transformers now supports decoding Wav2Vec2 outputs using a language model via Wav2Vec2ProcessorWithLM and an external 4-gram LM. The setup relies on KenLM and KenLM's pyctcdecode integration; it changes the decoding path from argmax to LM-enabled beam search, enabling corrections of common mis-transcriptions (e.g., misspellings) in LibriSpeech-style data. This capability is demonstrated with the patrickvonplaten/wav2vec2-base-100h-with-lm model, which ships a prebuilt LM. Teams can improve ASR accuracy in low-data scenarios, but must install kenlm and pyctcdecode, and adapt their inference to feed logits to the LM-based decoder.

Affected Systems

Wav2Vec2Wav2Vec2ForCTC

Date: Date not specified
Change type: capability
Severity: info

Wav2Vec2 with LM decoding in Transformers using kenLM and pyctcdecode (patrickvonplaten/wav2vec2-base-100h-with-lm)

More from Hugging Face

Get alerts for Hugging Face