ZeroGPU Spaces enables ahead-of-time PyTorch compilation for faster demo pipelines
AI Impact Summary
ZeroGPU Spaces can now use PyTorch ahead-of-time (AoT) compilation to prebuild the transformer portion of diffusion pipelines, avoiding repeated CUDA initialization for short-lived tasks. By capturing example inputs with spaces.aoti_capture, exporting via torch.export, and compiling with spaces.aoti_compile, the platform can reuse a prebuilt graph across processes, reducing cold-start latency and boosting throughput on Nvidia H200 GPUs. This approach leverages MIG slices (e.g., 3g.71gb and 7g.141gb) and optional FP8 quantization to achieve 1.3×–1.8× speedups on models like Flux, Wan, and LTX, but requires careful integration back into the pipeline (spaces.aoti_apply) to avoid memory issues and preserve model attributes.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info