MediumCapability

Introducing SimpleQA — new factuality benchmark for language models

AI Impact Summary

SimpleQA provides a new, focused benchmark for evaluating language model factuality, specifically designed for short, direct questions. This allows for rapid iteration and comparison of model performance on a core capability – accurate information retrieval. Teams can use SimpleQA to track improvements in model accuracy and identify areas where further training or prompting adjustments are needed to enhance response quality.

Affected Systems

SimpleQA

Business Impact

Teams can leverage SimpleQA to objectively measure and track the factual accuracy of their language models, informing development priorities and improving user trust.

Date: Date not specified
Change type: capability
Severity: medium

Introducing SimpleQA — new factuality benchmark for language models

More from OpenAI

Get alerts for OpenAI