InfoCapability

TextQuests: Evaluating LLMs in Interactive Fiction Games

AI Impact Summary

The TextQuests benchmark presents a novel approach to evaluating LLMs as autonomous agents by challenging them to play classic Infocom text-based adventure games. This test is particularly relevant because it demands sustained, long-context reasoning and exploration, areas where current LLMs frequently struggle, exhibiting issues like hallucination and repetitive actions as context grows. Successfully navigating these complex, interactive environments requires a deeper understanding of the game world and the ability to learn and adapt over extended gameplay sessions.

Affected Systems

ClaudeGemini

Date: Date not specified
Change type: capability
Severity: info

TextQuests: Evaluating LLMs in Interactive Fiction Games

More from Hugging Face

Get alerts for Hugging Face