InfoCapability

OpenAI GPT-OSS MXFP4 quantization and kernel support in transformers for efficient load, run, and fine-tuning

AI Impact Summary

OpenAI's GPT-OSS release introduces MXFP4 quantization and a hub-based kernel ecosystem integrated with transformers, enabling pre-built, device-mapped kernels and seamless load/run/finetune workflows. This can significantly reduce memory footprint and increase throughput for large models like GPT-OSS-20B and GPT-OSS-120B, potentially enabling single-GPU deployment with extended context windows. Note that some kernels are not compatible with MXFP4, which may cause inference to fall back to bf16, and enabling these features requires installing accelerate, kernels, and Triton and performing careful benchmarking to avoid performance regressions.

Affected Systems

openai/gpt-oss-20bopenai/gpt-oss-120b

Date: Date not specified
Change type: capability
Severity: info

OpenAI GPT-OSS MXFP4 quantization and kernel support in transformers for efficient load, run, and fine-tuning

More from Hugging Face

Get alerts for Hugging Face