InfoCapability

Accelerate Large Model Training with DeepSpeed ZeRO Stage-2 via Hugging Face Accelerate

AI Impact Summary

The document describes using Hugging Face Accelerate to enable DeepSpeed ZeRO optimization without code changes, specifically demonstrating ZeRO Stage-2 to train a ~900M-parameter DeBERTa-v2-xlarge-mnli model on a single-node with 2x24GB GPUs. It highlights a dramatic memory and throughput advantage: per-GPU batch size jumps from 8 (DDP) to 40, yielding roughly a 3.5x reduction in total training time while maintaining performance on MRPC, by partitioning optimizer states, gradients, and optionally offloading. To operationalize this, teams must configure a DeepSpeed config file and run accelerate config, noting precision considerations (bf16) and potential NaN losses, with a path to scale to other models and hardware using the same Accelerate + DeepSpeed workflow.

Affected Systems

DeepSpeedHugging Face Accelerate

Date: Date not specified
Change type: capability
Severity: info

Accelerate Large Model Training with DeepSpeed ZeRO Stage-2 via Hugging Face Accelerate

More from Hugging Face

Get alerts for Hugging Face