Overview
Production Readiness
0.4
Novelty Score
0.4
Cost Impact Score
0.3
Citation Count
6
Why It Matters For Business
LLM-driven agents can model multi-party dynamics (negotiations, markets, simulations) and improve decision-making, but measurement and domain alignment matter more than raw model size.
Summary TLDR
This short survey organizes work on using large language models (LLMs) for strategic reasoning—anticipating and influencing other agents in multi-player settings. It defines strategic reasoning, groups applications into societal, economic, game-theory, and gaming domains, reviews methods (prompting, modular agents, theory-of-mind, imitation/RL), and argues for unified benchmarks and mixed quantitative/qualitative evaluation. The paper flags gaps: missing standard benchmarks, uncertain scaling effects, and bias risks.
Problem Statement
Strategic reasoning means predicting and shaping others' actions in dynamic multi-agent settings. The field now has many ad hoc LLM uses across games, economics, and social simulation, but lacks a unified taxonomy, standardized benchmarks, and clear knowledge of what model sizes or methods reliably deliver human-like strategic abilities.
Main Contribution
Define strategic reasoning for LLMs and contrast it with other reasoning types.
Taxonomy of application scenarios: societal simulation, economic simulation, game theory, and gaming.
Survey methods to improve strategic reasoning: prompt engineering, modular agents, theory-of-mind, and imitation/RL.
Review evaluation practices and call for unified benchmarks and mixed quantitative/qualitative metrics.
Identify open challenges and research directions, including benchmark design and limits of scaling.
Key Findings
LLM strategic work spans four scenario families: societal, economic, game-theory, and gaming.
A modular agent (OG-Narrator) reported a tenfold profit boost over baselines in a bargaining context.
Evaluations use outcome metrics (win/survival rates) and process metrics (opponent prediction accuracy) together.
There is no widely adopted unified benchmark for strategic reasoning.
Results
profitability (bargaining)
Who Should Care
What To Try In 7 Days
Run a small multi-agent simulation (e.g., auction or negotiation) with an off-the-shelf LLM and log win/profit outcomes.
Experiment with task-specific prompts and a simple deterministic module (price proposal or rule engine) to compare returns.
Evaluate both outcomes (win rate, profit) and process signals (opponent prediction accuracy, belief updates).
Agent Features
Memory
- short-term dialog history
- retrieval memory (historical game logs)
- multi-frame summaries
Planning
- K-level reasoning
- chain-of-thought summaries
- multi-frame summarization
Tool Use
- external retrieval
- deterministic submodules (quote generators)
- summarization modules
Frameworks
- Alympics
- LLMArena
- GTBench
- OpenToM
Is Agentic
true
Architectures
- LLM-based agents (GPT-family)
Collaboration
- multi-agent coordination
- opponent modeling / theory-of-mind
Reproducibility
Open Source Status
- unknown
Risks & Boundaries
Limitations
- No standard unified benchmark across diverse strategic domains.
- Unclear mapping from model size/configuration to strategic ability.
- Potential social and political biases when simulating human interactions.
- Heterogeneous evaluation methods prevent direct comparisons.
When Not To Use
- High‑stakes or safety‑critical decisions that need verifiable guarantees.
- Real‑time control where latency and sensor integration dominate.
- Environments where precise numeric optimization is required without human-readable reasoning.
Failure Modes
- Hallucinated strategies or incorrect parsing of action spaces.
- Bias amplification in social or political simulations.
- Brittleness to nonstationary or adversarial opponents.
- Overreliance on scaling rather than structured modules or feedback.
Core Entities
Models
- GPT-4
- general LLMs (GPT-family and similar)
Metrics
- win rate
- survival rate
- reward
- Normalized Relative Advantage (NRA)
- TrueSkill
- Accuracy
Benchmarks
- GTBench
- LLMArena
- Alympics
- OpenToM
- BigToM
- WarAgent
- AucArena
- CompeteAI
Context Entities
Models
- ChessGPT
- Retroformer
- Thinker
- Suspicion-Agent

