InfoCapability

OpenAI GPT-OSS integration with Transformers adds MXFP4 quantization and prebuilt kernels

AI Impact Summary

OpenAI has released GPT-OSS models with MXFP4 quantization and a consolidated kernel ecosystem, and wired these capabilities into the transformers workflow to enable loading, running, and fine-tuning GPT-OSS 20B/120B with prebuilt kernels from the Hub. The update introduces zero-build kernels, Flash Attention 3, and MoE-specific kernels (e.g., MegaBlocksMoeMLP, Liger RMSNorm), which can significantly reduce memory footprint and improve throughput on supported GPUs, provided you opt-in via use_kernels. There are compatibility nuances: MXFP4 uses dedicated Triton kernels, while some kernel paths are not compatible with MXFP4 and may force inference to bf16; teams should verify model config (quant_method) and ensure prerequisites (accelerate, kernels, and Triton >= 3.4) are in place before production rollout.

Affected Systems

openai/gpt-oss-20b

Date: Date not specified
Change type: capability
Severity: info

OpenAI GPT-OSS integration with Transformers adds MXFP4 quantization and prebuilt kernels

More from Hugging Face

Get alerts for Hugging Face