MediumCapability

OpenAI and Anthropic share findings from joint safety evaluation

AI Impact Summary

OpenAI and Anthropic conducted a comprehensive joint safety evaluation of their foundational language models, revealing insights into areas of progress and persistent challenges across key metrics like misalignment, instruction following, and hallucination mitigation. This collaborative effort underscores the importance of shared research and benchmarking in advancing AI safety practices, particularly regarding vulnerabilities like jailbreaking. The findings will likely inform future development and red-teaming strategies for both organizations’ models.

Affected Systems

OpenAI modelsAnthropic models

Date: Date not specified
Change type: capability
Severity: medium

OpenAI and Anthropic share findings from joint safety evaluation

More from OpenAI

Get alerts for OpenAI