Overview
Production Readiness
0.6
Novelty Score
0.5
Cost Impact Score
0.7
Citation Count
2
Why It Matters For Business
Better queries reduce hallucination and improve downstream answer quality; matching optimization to query types saves API cost and improves customer trust.
Summary TLDR
This survey organizes query optimization for LLMs into a five‑phase Query Optimization Lifecycle (Intent Recognition → Query Transformation → Retrieval → Evidence Integration → Response Synthesis). It introduces a two‑axis Query Complexity Taxonomy (explicit vs implicit evidence; single vs multiple sources) and reviews four atomic operations—Expansion, Decomposition, Disambiguation, Abstraction—mapping each to practical use cases. The paper synthesizes representative methods, highlights evaluation gaps (lack of query‑level annotations, retriever dependence, efficiency metrics), and recommends adaptive, feedback‑driven pipelines starting from simple expansion and escalating to agentic, MDP‑dr
Problem Statement
User queries often mismatch how retrieval systems index knowledge. This semantic and compositionality gap causes RAG systems to retrieve poor evidence and LLMs to hallucinate. The paper argues query optimization—transforming queries before retrieval—is critical to reliable, knowledge‑intensive LLM applications.
Main Contribution
Query Optimization Lifecycle (QOL): a five‑phase pipeline from intent recognition to response synthesis
Query Complexity Taxonomy: two axes (evidence type and quantity) producing four query classes with mapped strategies
Comprehensive survey of four atomic operations: Expansion, Decomposition, Disambiguation, Abstraction, with representative methods
Practical guidance and research roadmap: evaluation gaps, process reward models, efficiency and multi‑modal challenges
Key Findings
Query optimization is essential: retrieval quality strongly determines final answer quality in RAG.
A small set of operations (4) covers most practical strategies: Expansion, Decomposition, Disambiguation, Abstraction.
Different query types benefit from different operations: Expansion best for simple factoid queries; Decomposition for multi‑hop; Disambiguation for implicit intent; Abstraction for complex analysis.
Iterative and agentic methods increasingly outperform single‑pass heuristics on complex tasks, but at higher cost.
Evaluation is fragmented: most benchmarks lack intermediate query‑level annotations and do not standardize efficiency metrics.
Who Should Care
What To Try In 7 Days
Profile incoming queries by the survey's taxonomy (explicit/implicit × single/multiple) to decide pipelines
Add a simple expansion step (HyDE/Query2Doc) for short factoid queries and measure Recall@K uplift
Implement lightweight disambiguation (echo or rephrase) for ambiguous conversational queries before retrieval
Optimization Features
Token Efficiency
- Concatenate pseudo‑docs to improve single‑pass retrieval
- Use self-assessment tokens to avoid unnecessary retrieval
Infra Optimization
- Batching parallel sub-query retrievals
- Hybrid sparse/dense retrieval routing
System Optimization
- Process supervision (RAG-Gym, MDPs)
- Feedback-driven rewriters (AdaQR, MaFeRw)
- Retriever-aware optimization
Training Optimization
- Differentiable RAG reward training (RAG-DDR)
Inference Optimization
- Adaptive retrieval triggers (FLARE, DRAGIN, Self-RAG)
- Parallel vs sequential planning (Plan×RAG, QueryPlanner)
- Early termination and decision policies (MDP-based methods)
Reproducibility
Open Source Status
- unknown
Risks & Boundaries
Limitations
- Literature coverage ends in early 2026; new methods after may be missing
- Focuses mainly on text queries; multi‑modal optimization needs a dedicated survey
- No unified empirical comparisons due to heterogeneous setups across papers
When Not To Use
- Real‑time low‑latency apps where multi‑round decomposition would exceed latency budgets
- Small domains where a tuned sparse retriever and direct prompts suffice
- Cases where user interaction is impossible and multi‑branch clarification would confuse users
Failure Modes
- Error propagation in sequential decomposition (early subquery mistakes cascade)
- Hallucination from expansion: pseudo‑docs can be factually wrong but semantically useful
- Retriever mismatch: optimizations tuned to one retriever may underperform with another
- Over‑abstraction that misses domain specifics and produces overgeneralized answers
Core Entities
Models
- GPT-4
- Claude
- Gemini
- LLaMA
Metrics
- Recall@K
- MRR
- nDCG
- Precision@K
- Exact Match
- F1
- Accuracy
- ROUGE/BLEU
- LLM API Calls
- Retrieval Latency
- Token Usage
Datasets
- NaturalQuestions
- TriviaQA
- WebQuestions
- HotpotQA
- 2WikiMultiHopQA
- MuSiQue
- QReCC
- TopiOCQA
- RAD-Bench
- RAG-QA Arena
Benchmarks
- HotpotQA
- NaturalQuestions
- MuSiQue
- RAD-Bench
- RAG-QA Arena
Context Entities
Models
- Contriever
- DPR
- ANCE
Metrics
- Token Budgeting
- Total Wall-Clock Time
Datasets
- 2WikiMultiHopQA
- HotpotQA
- Web corpora
- Wikipedia snapshots
Benchmarks
- Sub-question coverage benchmarks

