OpenAI: Reinforcement learning reward function misspecification can yield unintended policies | SignalBreak | SignalBreak