MediumCapability

OpenAI safe-completions in GPT-5: output-centric safety training

AI Impact Summary

OpenAI is shifting its approach to AI safety with the introduction of safe-completions in GPT-5, moving away from strict refusals to a more nuanced training methodology. This output-centric training focuses on guiding the model's responses to mitigate risks associated with dual-use prompts, aiming to improve both safety and the overall helpfulness of the generated content. The change represents a significant investment in understanding and controlling model behavior at the output level, rather than simply blocking undesirable inputs.

Affected Systems

GPT-5

Business Impact

The adoption of safe-completions in GPT-5 will enable more robust and reliable AI applications by reducing the risk of harmful or inappropriate outputs, leading to increased user trust and confidence.

Date: Date not specified
Change type: capability
Severity: medium

OpenAI safe-completions in GPT-5: output-centric safety training

More from OpenAI

Get alerts for OpenAI