Adversarial robustness does not transfer between perturbation types — defense evaluation must span multiple attack vectors
AI Impact Summary
Research demonstrates that adversarial robustness trained against one perturbation type (e.g., L∞ norm) does not reliably transfer to other perturbation types, and can sometimes degrade robustness against alternative attacks. Testing across 32 attacks spanning 5 perturbation categories on ImageNet-derived models reveals that narrow robustness optimization creates vulnerability to out-of-distribution attack vectors. This finding directly impacts ML security practices: models certified robust against specific perturbation bounds may fail catastrophically against real-world adversarial inputs using different attack mechanisms.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- medium