InfoCapability

TimeScope benchmark assesses long-video understanding for vision-language models

AI Impact Summary

TimeScope is an open-source benchmark that inserts short needles into base videos (1 minute to 8 hours) to evaluate localized retrieval, information synthesis, and fine-grained temporal perception in vision-language models. It exposes that many leading models struggle with true temporal comprehension and that simply scaling parameters does not extend the useful context horizon beyond roughly short clips. Gemini 2.5-Pro stands out by maintaining accuracy on videos longer than one hour, while other models like Qwen 2.5-VL and InternVL variants show task-specific strengths and weaknesses. The public Hugging Face Space and accompanying lmms_eval tooling will accelerate community benchmarking and highlight where training and data need to emphasize long-form temporal reasoning.

Affected Systems

TimeScope benchmarkGemini 2.5-Pro

Date: Date not specified
Change type: capability
Severity: info

TimeScope benchmark assesses long-video understanding for vision-language models

More from Hugging Face

Get alerts for Hugging Face