Hugging Face Data is Better Together releases Open Preference Dataset for Text-to-Image Generation (Apache 2.0)
AI Impact Summary
Open Preference Dataset for Text-to-Image Generation is being released as an Apache 2.0 licensed open dataset by Data is Better Together, combining prompts, toxicity filtering, and category diversification to support T2I benchmarking. The workflow includes multi-model filtering (two text-based and two image-based classifiers) with manual review, synthetic prompt enhancement, and cross-model evaluation using stabilityai/stable-diffusion-3.5-large and black-forest-labs/FLUX.1-dev, with artifacts published on the Hugging Face Hub and processing code on GitHub. This enables reproducible benchmarking and fine-tuning workflows (e.g., LoRA adapters) across multiple model families in open-source pipelines.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info