InfoCapability

ZeroGPU Spaces AoT PyTorch compilation for H200 GPUs boosts inference speed

AI Impact Summary

ZeroGPU Spaces now supports Ahead-of-Time (AoT) compilation to pre-export transformer components for DiffusionPipeline-based demos on Nvidia H200 GPUs. The workflow uses spaces.aoti_capture to collect example inputs, torch.export.export to produce an exported PyTorch program, spaces.aoti_compile to build a reusable AoT binary, and spaces.aoti_apply to swap the compiled transformer into the pipeline. This reduces cold-start latency and delivers reported speedups of 1.3x–1.8x on models like FLUX.1-dev, Wan, and LTX, improving responsiveness for short-lived Spaces tasks. Teams should plan for storing and versioning AoT artifacts (e.g., black-forest-labs/FLUX.1-dev) and consider how FP8 quantization and dynamic shapes may affect compatibility.

Affected Systems

ZeroGPU SpacesHugging Face Spaces

Date: Date not specified
Change type: capability
Severity: info

ZeroGPU Spaces AoT PyTorch compilation for H200 GPUs boosts inference speed

More from Hugging Face

Get alerts for Hugging Face