InfoCapability

FineVideo dataset: 43k annotated videos for training video understanding and diffusion models

AI Impact Summary

FineVideo aggregates 43k annotated videos (3.4k hours) with rich descriptions, narrative details, scene splits, and QA pairs, creating a reusable data foundation to train video understanding, text-to-video diffusion, and CV models using structured metadata. The pipeline spans from 1.9M YouTube-Commons sources to 600k dynamic videos using language filtering, a multi-level taxonomy, and annotation with Llama 3.1 70B via TGI, with downloads and processing orchestrated through Video2Dataset and cloud batch jobs to S3. This enables rapid model development but introduces licensing, provenance, bias risks, and cost/compute considerations that teams must plan for when integrating into workflows.

Affected Systems

FineVideo datasetYouTube-Commons

Date: Date not specified
Change type: capability
Severity: info

FineVideo dataset: 43k annotated videos for training video understanding and diffusion models

More from Hugging Face

Get alerts for Hugging Face