HighCapability

Anthropic expands model safety bug bounty program to test new mitigations

Action Required

Failure to address universal jailbreaks could lead to misuse of Anthropic's AI models with potentially severe consequences in high-risk domains.

AI Impact Summary

Anthropic is expanding its model safety bug bounty program to proactively identify and mitigate universal jailbreak attacks, particularly in high-risk domains like CBRN and cybersecurity. This initiative focuses on testing a new, unreleased safety mitigation system, offering rewards up to $15,000 for successful submissions. Addressing universal jailbreaks is critical to preventing misuse of AI models and ensuring responsible AI development, aligning with broader industry commitments.

Affected Systems

AI models

Date: Date not specified
Change type: capability
Severity: high

Anthropic expands model safety bug bounty program to test new mitigations

More from Anthropic

Get alerts for Anthropic