InfoCapability

Whisper-based ASR with diarization and speculative decoding via Hugging Face Inference Endpoints

AI Impact Summary

This introduces a custom inference handler that wires Whisper ASR, a Pyannote diarization model, and optional speculative decoding into a single Hugging Face Inference Endpoint. The solution is broken into modular files (handler.py, diarization_utils.py, config.py) and uses ModelSettings/InferenceConfig to select models and parameters at runtime via environment variables (HF_MODEL_DIR, DIARIZATION_MODEL, HF_TOKEN, ASR_MODEL, ASSISTANT_MODEL). It relies on PyTorch 2.2 with Flash Attention 2 and notes our speedups depend on audio length and batch sizing, with speculative decoding offering gains for short clips but potentially neutral or negative for longer inputs. Operationally, deploying this requires token management for the diarization model and programmatic endpoint provisioning to supply secrets, increasing integration complexity but enabling a unified, diarized transcription endpoint.

Affected Systems

openai/whisper-large-v3

Date: Date not specified
Change type: capability
Severity: info

Whisper-based ASR with diarization and speculative decoding via Hugging Face Inference Endpoints

More from Hugging Face

Get alerts for Hugging Face