MediumCapability

Process-supervised math reasoning achieves state-of-the-art performance and alignment

AI Impact Summary

The model now uses process supervision, rewarding each correct step in the reasoning chain instead of only the final answer, which tends to produce more reliable, human-aligned chain-of-thought traces. This strengthens traceability and auditability for math-heavy applications and can improve safety in automated reasoning within decision-support tools. Teams should expect longer generation times and should plan to overhaul training, reward models, and evaluation pipelines to measure step-level correctness and collect step-by-step demonstrations.

Business Impact

Adopting process supervision will raise math accuracy and alignment, but will require changes to training and evaluation pipelines to handle longer step-by-step reasoning outputs.

Risk domains

792%7

Date: Date not specified
Change type: capability
Severity: medium

Process-supervised math reasoning achieves state-of-the-art performance and alignment

More from OpenAI

Get alerts for OpenAI