Deep double descent observed in CNNs, ResNets, and transformers—implications for training and regularization
AI Impact Summary
New observation confirms double descent occurs across CNNs, ResNets, and transformers, meaning performance can worsen temporarily as capacity, data, or training time grows before improving again. This non-monotonic behavior complicates benchmarking and resource planning, since naive scaling may produce transient validation drops and mislead progress assessments. Teams should incorporate this phenomenon into experimental design, tracking performance across multiple scales and ensuring adequate regularization and training duration to reach the second descent.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- medium