OpenAI: Scaling laws for reward model overoptimization in RLHF pipelines | SignalBreak | SignalBreak