MediumCapability

gpt-oss-safeguard technical report — open-weight reasoning models with policy training

AI Impact Summary

This report details the capabilities and safety evaluations of the newly developed gpt-oss-safeguard models, gpt-oss-safeguard-120b and gpt-oss-safeguard-20b. These models are fine-tuned versions of the gpt-oss models, specifically trained to reason and label content based on a provided policy. The report leverages the underlying gpt-oss models for baseline safety evaluations, highlighting the importance of policy adherence in these open-weight reasoning models.

Affected Systems

gpt-oss-safeguard-120bgpt-oss-safeguard-20b

Date: Date not specified
Change type: capability
Severity: medium

gpt-oss-safeguard technical report — open-weight reasoning models with policy training

More from OpenAI

Get alerts for OpenAI