InfoCapability

Big Bench Audio evaluates audio reasoning for GPT-4o and Gemini 1.5 — speech-to-speech 66% vs text 92%

AI Impact Summary

Artificial Analysis releases Big Bench Audio to measure audio-based reasoning by converting Big Bench Hard questions into audio across four categories. The results show a notable speech reasoning gap: GPT-4o scores 92% on text-only tasks but only 66% for Speech to Speech, indicating voice-based reasoning lags even for top models. Traditional pipeline setups using Whisper transcription, GPT-4o reasoning, and TTS-1 generate the strongest performance among audio configurations, though they still trail pure text, underscoring the need for model improvements or hybrid workflows when reasoning quality is critical.

Affected Systems

Big Bench AudioGPT-4o

Date: Date not specified
Change type: capability
Severity: info

Big Bench Audio evaluates audio reasoning for GPT-4o and Gemini 1.5 — speech-to-speech 66% vs text 92%

More from Hugging Face

Get alerts for Hugging Face