Chat Templates: Prevent Silent Performance Degradation
AI Impact Summary
Incorrect formatting of chat model inputs can lead to severe, silent performance degradation due to distribution shift. Hugging Face tokenizers now provide a `chat_template` attribute to ensure consistent formatting, using Jinja templates to convert conversation histories into a tokenizable string. This addresses the issue of models being trained with varying formats, a common and previously undocumented problem, and provides a flexible solution for preprocessing across diverse model types.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info