MediumCapability

Neural network policies vulnerable to adversarial attacks in reinforcement learning

AI Impact Summary

Neural network policies used in reinforcement learning systems are vulnerable to adversarial attacks similar to those affecting computer vision models. Attackers can craft small input perturbations that degrade policy performance significantly without being perceptible to humans, affecting both white-box and black-box threat models across different training algorithms. This vulnerability applies broadly to RL-based systems regardless of task or algorithm choice, creating a potential attack surface for any production system relying on neural network decision-making.

Affected Systems

neural network policiesreinforcement learning systems

Date: Date not specified
Change type: capability
Severity: medium

Neural network policies vulnerable to adversarial attacks in reinforcement learning

More from OpenAI

Get alerts for OpenAI