Overview
The paper consolidates many recent methods and practical patterns, but provides literature synthesis rather than unified empirical comparisons; apply recommendations while measuring retriever‑ and deployment‑specific costs.
Citations2
Evidence Strength0.65
Confidence0.85
Risk Signals10
Trust Signals
Findings with numeric evidence: 1/5
Findings with evidence refs: 5/5
Results with explicit delta: 0/0
Reproducibility
Status: No open assets linked
Open source: Unknown
At A Glance
Cost impact: 70%
Production readiness: 60%
Novelty: 50%
Why It Matters For Business
Better queries reduce hallucination and improve downstream answer quality; matching optimization to query types saves API cost and improves customer trust.
Who Should Care
Summary TLDR
This survey organizes query optimization for LLMs into a five‑phase Query Optimization Lifecycle (Intent Recognition → Query Transformation → Retrieval → Evidence Integration → Response Synthesis). It introduces a two‑axis Query Complexity Taxonomy (explicit vs implicit evidence; single vs multiple sources) and reviews four atomic operations—Expansion, Decomposition, Disambiguation, Abstraction—mapping each to practical use cases. The paper synthesizes representative methods, highlights evaluation gaps (lack of query‑level annotations, retriever dependence, efficiency metrics), and recommends adaptive, feedback‑driven pipelines starting from simple expansion and escalating to agentic, MDP‑dr
Problem Statement
User queries often mismatch how retrieval systems index knowledge. This semantic and compositionality gap causes RAG systems to retrieve poor evidence and LLMs to hallucinate. The paper argues query optimization—transforming queries before retrieval—is critical to reliable, knowledge‑intensive LLM applications.
Main Contribution
Query Optimization Lifecycle (QOL): a five‑phase pipeline from intent recognition to response synthesis
Query Complexity Taxonomy: two axes (evidence type and quantity) producing four query classes with mapped strategies
Key Findings
Query optimization is essential: retrieval quality strongly determines final answer quality in RAG.
A small set of operations (4) covers most practical strategies: Expansion, Decomposition, Disambiguation, Abstraction.
What To Try In 7 Days
Profile incoming queries by the survey's taxonomy (explicit/implicit × single/multiple) to decide pipelines
Add a simple expansion step (HyDE/Query2Doc) for short factoid queries and measure Recall@K uplift
Implement lightweight disambiguation (echo or rephrase) for ambiguous conversational queries before retrieval
Optimization Features
Token Efficiency
Infra Optimization
System Optimization
Training Optimization
Inference Optimization
Reproducibility
Risks & Boundaries
Limitations
Literature coverage ends in early 2026; new methods after may be missing
Focuses mainly on text queries; multi‑modal optimization needs a dedicated survey
When Not To Use
Real‑time low‑latency apps where multi‑round decomposition would exceed latency budgets
Small domains where a tuned sparse retriever and direct prompts suffice
Failure Modes
Error propagation in sequential decomposition (early subquery mistakes cascade)
Hallucination from expansion: pseudo‑docs can be factually wrong but semantically useful

