InfoCapability

Fixing Gradient Accumulation: Custom Loss Function API

AI Impact Summary

The Hugging Face Transformers library has a subtle issue with gradient accumulation, specifically related to the default loss function used for causal language modeling. This issue arises because the default loss function doesn't account for the correct loss calculation when gradient accumulation is enabled, leading to mismatched training runs. The fix involves exposing an API to allow users to define their own loss functions, ensuring accurate loss computation during gradient accumulation across token-level tasks.

Affected Systems

Hugging Face Transformerstransformers Trainer

Date: Date not specified
Change type: capability
Severity: info

Fixing Gradient Accumulation: Custom Loss Function API

More from Hugging Face

Get alerts for Hugging Face