ZeroGPU Spaces AoT PyTorch compilation for H200 GPUs boosts inference speed
AI Impact Summary
ZeroGPU Spaces now supports Ahead-of-Time (AoT) compilation to pre-export transformer components for DiffusionPipeline-based demos on Nvidia H200 GPUs. The workflow uses spaces.aoti_capture to collect example inputs, torch.export.export to produce an exported PyTorch program, spaces.aoti_compile to build a reusable AoT binary, and spaces.aoti_apply to swap the compiled transformer into the pipeline. This reduces cold-start latency and delivers reported speedups of 1.3x–1.8x on models like FLUX.1-dev, Wan, and LTX, improving responsiveness for short-lived Spaces tasks. Teams should plan for storing and versioning AoT artifacts (e.g., black-forest-labs/FLUX.1-dev) and consider how FP8 quantization and dynamic shapes may affect compatibility.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info