OpenAI: Scaling laws for reward model overoptimization | SignalBreak | SignalBreak