Hugging Face Transformers enables Wav2Vec2 with n-gram LM via KenLM and pyctcdecode
AI Impact Summary
The content describes integrating a language model into Wav2Vec2 decoding using Wav2Vec2ProcessorWithLM, KenLM, and pyctcdecode. By decoding logits with an n-gram LM, the approach improves transcription accuracy, notably when training data is scarce (e.g., 4-gram LM on a fine-tuned checkpoint). For production, this entails adding dependency stacks and model repos like patrickvonplaten/wav2vec2-base-100h-with-lm, with potential impacts on latency and maintenance of the LM data alongside the ASR model.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info