Transformers Trainer: Gradient accumulation loss fix and custom loss API
AI Impact Summary
The gradient accumulation bug arose because the default loss function in model classes did not correctly account for non-padding token counts across accumulation steps, causing training losses to diverge from full-batch training results. The fix introduces an explicit loss computation pathway (including a ForCausalLMLoss variant for token-level tasks), adds a loss_function property to PreTrainedModel, and exposes a LOSS_MAPPING mechanism to support custom losses, enabling correct loss reporting during gradient accumulation. The changes are rolled into two pull requests aimed at propagating the correct loss handling to most models and then enabling user-defined losses, with a plan to ship in the next release so users can upgrade from main for immediate benefit.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info