InfoCapability

Hugging Face Transformers: Fix gradient accumulation loss calculation in Trainer

AI Impact Summary

Transformers Trainer's gradient accumulation currently reports a loss that can diverge from full-batch training because it relies on the default loss function, which is intended for standard pass-through loss computation. The fix enforces a proper aggregation: the total loss across all non-padding tokens in an accumulation step is used, with support for user-supplied losses via PreTrainedModel.loss_function and a LOSS_MAPPING mapping; this lets users inject custom loss logic and ensures consistent optimization signals. By shipping these changes to main and upstream releases, teams can upgrade to obtain correct loss reporting without reworking their training loops.

Affected Systems

Hugging Face TransformersTransformers Trainer

Date: Date not specified
Change type: capability
Severity: info

Hugging Face Transformers: Fix gradient accumulation loss calculation in Trainer

More from Hugging Face

Get alerts for Hugging Face