Introducing the Synthetic Data Generator — no-code dataset creation with LLMs
AI Impact Summary
Introducing a no-code Synthetic Data Generator that creates text classification and chat datasets from prompts, powered by distilabel and Hugging Face text-generation APIs with outputs exported to Argilla and the Hugging Face Hub. It supports model choices like meta-llama/Llama-3.1-8B-Instruct and gpt-4o, with optional OpenAI integration and AutoTrain training. This accelerates data preparation and model iteration for NLP tasks, but increases reliance on external ML services and pipeline components.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info