gpt-oss-safeguard technical report — open-weight reasoning models with policy training
AI Impact Summary
This report details the capabilities and safety evaluations of the newly developed gpt-oss-safeguard models, gpt-oss-safeguard-120b and gpt-oss-safeguard-20b. These models are fine-tuned versions of the gpt-oss models, specifically trained to reason and label content based on a provided policy. The report leverages the underlying gpt-oss models for baseline safety evaluations, highlighting the importance of policy adherence in these open-weight reasoning models.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- medium