Hugging Face: Accelerate v1.10.0 introduces N-D Parallelism with ParallelismConfig for DP/TP/CP across device meshes | SignalBreak | SignalBreak

InfoCapability

Accelerate v1.10.0 introduces N-D Parallelism with ParallelismConfig for DP/TP/CP across device meshes

AI Impact Summary

Hugging Face Accelerate v1.10.0 adds N-D parallelism by integrating Axolotl, enabling simultaneous use of DP, TP, and CP via a single ParallelismConfig in training scripts. Engineers specify dp_shard_size, dp_replicate_size, cp_size, and tp_size, then wire the config into the Accelerator (parallelism_config) and prepare the model (from_pretrained, device_mesh, prepare), enabling device-mesh layouts for large AutoModelForCausalLM workloads. The release also includes FSDP improvements and BYODM support, broadening scalability to PEFT and MOE scenarios (e.g., fine-tuning GPT-OSS), while requiring code updates to adopt the new parallelism API and ensure environment variables are read correctly. Axolotl collaboration signals a move toward simpler, more reliable multi-GPU scaling, but teams should plan migration to the new ParallelismConfig pattern and validate trainer state handling across resets.

Accelerate v1.10.0 introduces N-D Parallelism with ParallelismConfig for DP/TP/CP across device meshes

Affected Systems

Hugging Face Accelerate - Accelerator

Date: Date not specified
Change type: capability
Severity: info

Checking your AI register…

More from Hugging Face

Get alerts for Hugging Face

SignalBreak monitors Hugging Face and 27 other AI providers across 150+ endpoints. Sign up free to get notified when things change.

Sign up free — no credit card required

Source text

N-D Parallelism Training large models across multiple GPUs can be complex, especially when combining different parallelism strategies (e.g TP, CP, DP). To simplify this process, we've collaborated with Axolotl to introduce an easy-to-use integration that allows you to apply any combination of parallelism strategies directly in your training script. Just pass a ParallelismConfig specifying the size of each parallelism type—it's that simple. Learn more about how it works in our latest blogpost . parallelism_config = ParallelismConfig ( dp_shard_size = 2 , dp_replicate_size = 2 , cp_size = 2 , tp_size = 2 , ) accelerator = Accelerator ( parallelism_config = parallelism_config , ... ) model = AutoModelForCausalLM . from_pretrained ( "your-model-name" , device_mesh = accelerator . torch_device_mesh ) model = accelerator . prepare ( model ) Parallelism config + TP + HSDP + BYODM (Bring Your Own Device Mesh) by @salmanmohammadi in #3682 Feat: context parallel v2.0 by @S1ro1 in #3700 set default submesh_tp_size to prevent unset local variable error by @winglian in #3687 Add Parallelism getter property to Accelerator class by @WoosungMyung in #3703 Fix: prepare works even if nothing except tp specified (rare) by @S1ro1 in #3707 Set parallelism_config in constructor due to Trainer reset of State by @winglian in #3713 Fix: tp size wouldn't read from env by @S1ro1 in #3716 Remove ParallelismConfig from PartialState by @SunMarc in #3720 FSDP improvements We've fixed ignored modules attribute. With this, it is now possible to train PEFT model that moe layers that contrains q_proj and v_proj parameters. This is especially important for fine-tuning gpt-oss model. ENH: Allow FSDP ignored modules to be regex by @BenjaminBossan in #3698 TST Add test for FSDP ignored_modules as str by @BenjaminBossan in #3719 Minor improvements feature: CpuOffload pre_forward don't attempt to move if already on device by @JoeGaffney in #3695 Fix: Ensure environment variable values are case-insensitive

View original source