Training Design for Text-to-Image Models: Ablation Experiments with Representation Alignment
AI Impact Summary
This document details an experimental logbook for training text-to-image models, focusing on improving training efficiency and convergence. The team is systematically evaluating techniques against a stable baseline (PRX-1.2B) using metrics like FID and CMMD to quantify improvements. A key insight is the potential bottleneck in representation learning, where diffusion and flow models lag behind modern vision encoders, suggesting a strategy of aligning representations with a pre-trained encoder to accelerate learning and reduce compute requirements.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info