CinePile 2.0 uses Adversarial Refinement to strengthen long-video QA dataset quality
AI Impact Summary
CinePile 2.0 introduces an Adversarial Refinement pipeline that iteratively edits questions and answers to defeat a Deaf-Blind LLM, reducing reliance on cues and bias. The pipeline combines open-source and proprietary models (LLaMA 3.1 70B for local question modification; GPT-4 for generation; Gemini, GPT-3.5, Phi-1.5 as evaluators) to identify degeneracy and guide edits. Reported results show 90.24% of degenerate Q&A pairs in the test set were modified, with remaining hard cases manually reviewed, indicating a tangible uplift in dataset quality and robustness. This approach enables scalable, repeatable dataset hardening that can improve evaluation reliability for long-video QA benchmarks and downstream vision-language models.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info