OpenAI releases GPT OSS open-source model family (120B/20B) with MXFP4 MoE and 4-bit quantization
AI Impact Summary
OpenAI has released GPT OSS, a new open-source model family comprising gpt-oss-120b and gpt-oss-20b, both MoEs with 4-bit MXFP4 quantization designed for fast inference and low memory usage. This enables private or on-device deployments via diverse tooling, including Hugging Face Inference Providers, the OpenAI-compatible Responses API, and local ecosystems (transformers, vLLM, llama.cpp, ollama). The Apache 2.0 license and minimal usage policy open up experimentation and distribution, but impose compliance considerations for deployments and governance. Technical teams should plan for GPU provisioning (80 GB for 120B, 16 GB for 20B), integration with existing inference pipelines, and validation of 4-bit MXFP4 weight loading and associated kernels across supported runtimes.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info