New tooling to build video datasets: video2dataset pipeline with Florence-2 and CogVideoX-5B
AI Impact Summary
Open video dataset tooling lays out a 3-stage pipeline (Acquisition with yt-dlp and Video to Scenes; Pre-processing with OpenCV; Processing with Florence-2 and optional full-video captions via Qwen2.5) to mirror the established image-dataset workflow used by video2dataset. This enables both small-scale dataset curation and large-scale data generation for fine-tuning video generation models such as CogVideoX-5B, with configurable filters (watermarks, aesthetics, OCR). The shift unlocks faster iteration and standardization of data pipelines, but will demand compute resources and governance around licensing, NSFW filtering, and data quality across datasets.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info