Ettin Suite: SoTA Paired Encoder and Decoder Models Trained with Identical Data and Recipe
AI Impact Summary
Ettin Suite introduces state-of-the-art paired encoder-only and decoder-only models trained using identical data, model sizes, and training recipes, enabling apples-to-apples comparisons across architectures. The project uses the ModernBERT recipe across six sizes (17M–1B) and a three-phase training schedule (1.7T pre-training, 250B context-extension, 100B decay) with open, reproducible data sources, delivering encoder models that beat ModernBERT and decoder models that beat Llama 3.2 and SmolLM2. This setup also exposes architecture-specific advantages and shows cross-objective pretraining is not universally beneficial, with public artifacts like jhu-clsp/ettin-encoder-150m available for quick adoption.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info