InfoCapability

Open-source DeepResearch reproduction uses code-based agents to reach 55.15% GAIA score

AI Impact Summary

Open-source reproduction of DeepResearch demonstrates that an end-to-end agent stack combining an LLM, a code-based action layer, and lightweight browser/tooling can achieve 55.15% GAIA validation on the benchmark, outperforming JSON-based action baselines. The effort leverages smolagents, Magentic-One tooling, and OpenAI Operator-like capabilities, illustrating a practical path to open, competitive agent frameworks. For business teams, this signals rising parity between open-source agent stacks and proprietary solutions, with potential cost and control benefits as internal implementations can reach comparable performance without vendor lock-in.

Affected Systems

Deep Researchsmolagents

Date: Date not specified
Change type: capability
Severity: info

Open-source DeepResearch reproduction uses code-based agents to reach 55.15% GAIA score

More from Hugging Face

Get alerts for Hugging Face