Nemotron-Personas-India: Open synthetic Indic personas for Sovereign AI development
AI Impact Summary
NVIDIA releases Nemotron-Personas-India, a synthetic dataset of 21 million Indic personas aligned to India's demographic distributions, built with NeMo Data Designer. The dataset includes 7.7B tokens across English and Hindi (Devanagari and Latin), 27 fields per record, and synthetic names, enabling training of multilingual, culturally grounded copilots without real personal data. Licensed under CC BY 4.0 and designed for privacy, it integrates with Nemotron models and open-source LLMs (e.g., GPT-OSS-120B) to support region-aware fine-tuning, while teams should assess biases and regulatory alignment as part of governance.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info