Jupyter Agent enables LLMs to execute code inside notebooks for data science tasks (Qwen-3 Coder baseline)
AI Impact Summary
The Jupyter Agent project enables LLMs to execute code inside a Jupyter notebook to perform data analysis tasks, displaying multi-step reasoning alongside runtime results. It uses Qwen-3 Coder with a lean ~200-line scaffolding and a final_answer tool, delivering measurable gains on easy tasks but still facing hard-task gaps per the DABStep benchmark. The data pipeline constructs training material from Kaggle notebooks, including large-scale deduplication, dataset linking via Datatrove, educational scoring, and QA generation with Qwen-3-32B to create realistic, runnable training data.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info