HuggingFace and IISc/ARTPARK expand Vaani dataset access for India's diverse languages
AI Impact Summary
HuggingFace partners with IISc and ARTPARK to broaden access to the Vaani dataset, an open-source, multi-modal, multilingual resource representing India's linguistic diversity. The collaboration aims to increase usability on Hugging Face, enabling researchers and developers to train speech-to-text, language identification, and multimodal models across 54 languages and 773 districts, with Phase 2 expanding coverage to all states. This data foundation supports benchmarking and customization for localized AI solutions in education, healthcare, governance, and consumer applications.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info