InfoCapability

CinePile 2.0 uses Adversarial Refinement to strengthen long-video QA dataset quality

AI Impact Summary

CinePile 2.0 introduces an Adversarial Refinement pipeline that iteratively edits questions and answers to defeat a Deaf-Blind LLM, reducing reliance on cues and bias. The pipeline combines open-source and proprietary models (LLaMA 3.1 70B for local question modification; GPT-4 for generation; Gemini, GPT-3.5, Phi-1.5 as evaluators) to identify degeneracy and guide edits. Reported results show 90.24% of degenerate Q&A pairs in the test set were modified, with remaining hard cases manually reviewed, indicating a tangible uplift in dataset quality and robustness. This approach enables scalable, repeatable dataset hardening that can improve evaluation reliability for long-video QA benchmarks and downstream vision-language models.

Affected Systems

CinePile 2.0 datasetLLaMA 3.1 70B

Date: Date not specified
Change type: capability
Severity: info

CinePile 2.0 uses Adversarial Refinement to strengthen long-video QA dataset quality

More from Hugging Face

Get alerts for Hugging Face