TextQuests: Evaluating LLMs in Interactive Fiction Games
AI Impact Summary
The TextQuests benchmark presents a novel approach to evaluating LLMs as autonomous agents by challenging them to play classic Infocom text-based adventure games. This test is particularly relevant because it demands sustained, long-context reasoning and exploration, areas where current LLMs frequently struggle, exhibiting issues like hallucination and repetitive actions as context grows. Successfully navigating these complex, interactive environments requires a deeper understanding of the game world and the ability to learn and adapt over extended gameplay sessions.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info