Overview
The survey aggregates many prototype systems and reproducible benchmarks. Patterns are clear, but most methods remain at experimental or research-grade maturity; productization needs engineering for latency, memory scaling, and robust evaluation.
Citations6
Evidence Strength0.70
Confidence0.85
Risk Signals10
Trust Signals
Findings with numeric evidence: 3/4
Findings with evidence refs: 4/4
Results with explicit delta: 3/3
Reproducibility
Status: Partial assets available
Open source: Partial
At A Glance
Cost impact: 50%
Production readiness: 40%
Novelty: 60%
Why It Matters For Business
Game agents are a practical lab for building interactive AI: solutions for memory, robust reasoning, and hybrid control transfer to real automation, simulations, and multi-agent coordination systems used in product testing and virtual worlds.
Who Should Care
Summary TLDR
This paper surveys research that uses large language models (LLMs) as the brain of game agents. It proposes a compact reference architecture with three single-agent modules—working & long-term memory, reasoning, and perception-action interfaces—and a complementary multi-agent layer for communication and organization. The authors map six game genres to concrete agent requirements and summarize practical techniques (context-extension, compression, chain-of-thought variants, reflective loops, code-as-policy, hybrid LLM+low-level controllers). The survey flags latency, memory structuring, and evaluation gaps as the main engineering hurdles.
Problem Statement
LLMs are powerful at language but are trained on static text and lack mechanisms for continuous, grounded interaction. Games provide a reproducible, diverse testbed for building and testing interactive LLM-based agents, but agent design is fragmented: how to add memory, reliable reasoning, perception–action grounding, and scalable multi-agent coordination remains unclear.
Main Contribution
A unified reference architecture for LLM-based game agents: memory, reasoning, perception-action interfaces, and a multi-agent extension for communication and organization.
A challenge-centered taxonomy linking six game genres (action, adventure, role-playing, strategy, simulation, sandbox) to concrete agent design requirements.
Key Findings
Carrying the previous step's thought into the next prompt (LastThoughts) raises win rate and cuts short-term inconsistent actions.
Position and attention tricks extend LLM context windows by orders of magnitude.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Win Rate (PokéLLMon) | LLM (GPT-4o) 0.4217; LastThoughts 0.4667 | LLM (GPT-4o) 0.4217 | +0.0449 (relative +10.6%) | PokéLLMon Battles | Table 2 reports per-method win rates | Table 2 (PokéLLMon) |
| Consecutive Switch Rate (short-term consistency) | LLM 0.2442; LastThoughts 0.0861 | LLM 0.2442 | -0.1581 (relative -64.7%) | PokéLLMon Battles | Table 2 measures consecutive switches as a proxy for instability | Table 2 (PokéLLMon) |
What To Try In 7 Days
Add a simple step-to-step thought carryover (LastThoughts) to reduce inconsistent actions.
Implement a small long-term store (vector DB) plus importance-based write-back for episodic memory.
Separate high-level LLM planning from a low-level controller for latency-sensitive tasks and compare win rates.
Agent Features
Memory
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Collaboration
Optimization Features
Token Efficiency
Reproducibility
Risks & Boundaries
Limitations
Benchmarks are often templated and shallow, limiting tests of true open-ended generalization.
Many demonstrations require heavy compute and sequential API calls, raising cost and latency barriers.
When Not To Use
When you need strict real-time, frame-level control solely from an LLM (use hybrid controllers instead).
When the task cannot tolerate LLM hallucinations or inconsistent persona behavior without strong verification.
Failure Modes
Short-term decision inconsistency (action flip-flopping) without active maintenance.
Role drift over long dialogues or multi-episode play unless role memory is enforced.

