MediumCapability

Deep double descent observed in CNNs, ResNets, and transformers—implications for training and regularization

AI Impact Summary

New observation confirms double descent occurs across CNNs, ResNets, and transformers, meaning performance can worsen temporarily as capacity, data, or training time grows before improving again. This non-monotonic behavior complicates benchmarking and resource planning, since naive scaling may produce transient validation drops and mislead progress assessments. Teams should incorporate this phenomenon into experimental design, tracking performance across multiple scales and ensuring adequate regularization and training duration to reach the second descent.

Affected Systems

CNNsResNet

Date: Date not specified
Change type: capability
Severity: medium

Deep double descent observed in CNNs, ResNets, and transformers—implications for training and regularization

More from OpenAI

Get alerts for OpenAI