Nemotron-Personas-India: Open synthetic Indic personas via NVIDIA NeMo Data Designer
AI Impact Summary
Nemotron-Personas-India provides a large-scale, privacy-preserving synthetic dataset aligned to India's demographic distributions, addressing data gaps for multilingual, multi-script AI in India. It uses NeMo Data Designer to generate 21 million personas across English and Hindi (Devanagari/Latin), covering 36 states and 640 districts, with 27 fields per record. The release, licensed CC BY 4.0 and designed to integrate with Nemotron models and open-source LLMs, enables region-specific fine-tuning and evaluation pipelines without relying on real individuals.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info