OpenAI and Anthropic share findings from joint safety evaluation
AI Impact Summary
OpenAI and Anthropic conducted a comprehensive joint safety evaluation of their foundational language models, revealing insights into areas of progress and persistent challenges across key metrics like misalignment, instruction following, and hallucination mitigation. This collaborative effort underscores the importance of shared research and benchmarking in advancing AI safety practices, particularly regarding vulnerabilities like jailbreaking. The findings will likely inform future development and red-teaming strategies for both organizations’ models.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- medium