Fixing Gradient Accumulation: Custom Loss Function API
AI Impact Summary
The Hugging Face Transformers library has a subtle issue with gradient accumulation, specifically related to the default loss function used for causal language modeling. This issue arises because the default loss function doesn't account for the correct loss calculation when gradient accumulation is enabled, leading to mismatched training runs. The fix involves exposing an API to allow users to define their own loss functions, ensuring accurate loss computation during gradient accumulation across token-level tasks.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info