Neural network policies vulnerable to adversarial attacks in reinforcement learning
AI Impact Summary
Neural network policies used in reinforcement learning systems are vulnerable to adversarial attacks similar to those affecting computer vision models. Attackers can craft small input perturbations that degrade policy performance significantly without being perceptible to humans, affecting both white-box and black-box threat models across different training algorithms. This vulnerability applies broadly to RL-based systems regardless of task or algorithm choice, creating a potential attack surface for any production system relying on neural network decision-making.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- medium