InfoCapability

Transformers Trainer: Gradient accumulation loss fix and custom loss API

AI Impact Summary

The gradient accumulation bug arose because the default loss function in model classes did not correctly account for non-padding token counts across accumulation steps, causing training losses to diverge from full-batch training results. The fix introduces an explicit loss computation pathway (including a ForCausalLMLoss variant for token-level tasks), adds a loss_function property to PreTrainedModel, and exposes a LOSS_MAPPING mechanism to support custom losses, enabling correct loss reporting during gradient accumulation. The changes are rolled into two pull requests aimed at propagating the correct loss handling to most models and then enabling user-defined losses, with a plan to ship in the next release so users can upgrade from main for immediate benefit.

Affected Systems

Hugging Face Transformers — transformers.TrainerPreTrainedModel

Date: Date not specified
Change type: capability
Severity: info

Transformers Trainer: Gradient accumulation loss fix and custom loss API

More from Hugging Face

Get alerts for Hugging Face