OpenAI unveils GPT OSS open-source model family (gpt-oss-120b/20b) with MXFP4 4-bit MoE under Apache 2.0
AI Impact Summary
OpenAI has released GPT OSS, a new open-source MoE model family with gpt-oss-120b and gpt-oss-20b using MXFP4 4-bit quantization. The 20B model runs on consumer GPUs with ~16 GB RAM, while the 120B variant requires ~80 GB, enabling on-device or private deployments via Hugging Face Inference Providers and the OpenAI-compatible Responses API. Realizing this in production will require a capable software stack (transformers v4.55+, vLLM, llama.cpp, ollama) and optional acceleration kernels (Flash Attention 3, Triton, kernels-community) and careful consideration of licensing (Apache 2.0) and MoE routing performance.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info