Deepspeed Ulysses/ALST sequence parallelism added in v1.12.0
AI Impact Summary
v1.12.0 adds DeepSpeed Ulysses/ALST integration to enable sequence-parallelism and attention-head parallelism for long-sequence training. To enable, you must instantiate ParallelismConfig with sp_backend='deepspeed', sp_size=2, and a sp_handler such as DeepSpeedSequenceParallelConfig, per the docs. The change also introduces cross-rank loss aggregation using losses_per_rank, all_gather, and per-rank weighting to compute a correct final loss across ranks. This feature will also be available in HF Trainer, and affects pipelines that leverage DeepSpeed Ulysses/ALST and HuggingFace Transformers.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info