OpenAI: Variance reduction for policy gradient with action-dependent factorized baselines | SignalBreak | SignalBreak