Survey of LLM-based text-to-SQL with a focus on Retrieval-Augmented and Graph RAG solutions

Overview

Decision SnapshotNeeds Validation

The survey compiles many recent systems and datasets and makes a case for Graph RAG; the ideas are practical but often lack large-scale published numbers and incur latency and implementation complexity.

Citations2

Evidence Strength0.60

Confidence0.85

Risk Signals12

Trust Signals

Findings with numeric evidence: 1/6

Findings with evidence refs: 6/6

Results with explicit delta: 0/0

Reproducibility

Status: Partial assets available

Open source: Partial

At A Glance

Cost impact: 50%

Production readiness: 60%

Novelty: 70%

Authors

Ali Mohammadjafari, Anthony S. Maida, Raju Gottumukkala

Links

Abstract / PDF

Why It Matters For Business

RAG and especially Graph RAG let teams query unfamiliar databases without costly fine-tuning, but they increase system complexity and latency.

Who Should Care

CTO Product Manager ML Engineer Data Scientist Engineering Lead

Summary TLDR

This 15-page survey traces text-to-SQL systems from rule-based parsers to modern LLMs and retrieval-augmented generation (RAG). It catalogs datasets and metrics, compares in-context prompting, fine-tuning, and RAG approaches, and argues that Graph RAG—indexing schema and documents as a graph—improves schema linking, multi-hop reasoning, and cross-domain generalization while trading off latency and system complexity.

Problem Statement

Non-technical users need database answers but writing SQL is hard. Existing text-to-SQL models struggle with schema linking, ambiguous or multi-turn queries, rare SQL constructs, and cross-domain generalization. The survey asks: how do LLMs plus retrieval systems (especially Graph RAG) reduce these gaps, and what are the practical trade-offs?

Main Contribution

A concise history of text-to-SQL methods from rule-based to LLM-era pipelines.

A taxonomy that compares in-context learning, fine-tuning, and RAG-based systems for text-to-SQL.

Key Findings

RAG improves cross-domain generalization and zero-/few-shot performance by fetching schema-specific context.

Practical UsePrototype a simple vector DB+retriever for unseen schemas before building costly fine-tuning; expect better zero-shot results on new databases.

Evidence RefSec II-B, Sec III-A (cross-domain discussion)

Graph RAG organizes schema and documents into a knowledge graph to improve schema linking and multihop reasoning.

Practical UseIf your application needs reliable joins, nested queries, or cross-table reasoning, invest in a graph-indexed retrieval layer to reduce schema ambiguity.

Evidence RefSec IV (Graph RAG novelty and modularity discussions)

What To Try In 7 Days

Build a small prototype: wire a vector DB to an off-the-shelf LLM and evaluate execution accuracy on a sample schema.

Measure retriever recall for schema metadata and tune the retriever before changing prompts or models.

If queries need cross-table reasoning, test a simple graph index (table/column nodes + FK edges) to see schema-link improvements.

Optimization Features

System Optimization

Index pre-partitioning for graph retrieval (modularity/Leiden)Cache retrieved schema snippets

Reproducibility

Code AvailableNo

Data AvailableYes

Open Source StatusPartial

LicenseUnknown

Risks & Boundaries

Limitations

High runtime cost and latency from retrieval and iterative loops.

Most systems struggle to adapt to frequent schema changes without retraining or incremental updates.

When Not To Use

Low-latency applications that cannot tolerate retrieval latency.

Very small or static schemas where fine-tuning would be cheaper and faster.

Failure Modes

Incorrect schema linking causing wrong joins or wrong tables returned.

Retriever returns irrelevant or stale context, leading to hallucinated SQL.

Core Entities

Models

GPT-4GPT-3.5CodexLLaMALLaMA2DataGpt-SQL-7BCodex/Code models (general)

Metrics

AccuracyExact Match (EM)Component Matching (CM)Valid Efficiency Score (VES)

Datasets

SpiderWikiSQLCoSQLSParCBIRDSpider-DKKaggleDBAADVETAGeoQueryATISmSpiderDuSQL

Benchmarks

SpiderWikiSQLBIRDCoSQLSParCADVETAReal-World/Industry benchmarks (Financial SQL, ClinicalDB)

Context Entities

Models

GPT-family in-context promptingFine-tuned LLMs (DELlM, SQL-GEN variants)

Metrics

Recall/Precision of retrieverLatency and efficiency measures

Datasets

Spider-variants (e.g., Spider-Syn)Benchmarks used for cross-lingual/multi-lingual testing

Benchmarks

Table retrieval benchmarks (TARGET)Domain-specific benchmarks (Financial SQL, Nibiru)

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

RAG improves cross-domain generalization and zero-/few-shot performance by fetching schema-specific context.

Graph RAG organizes schema and documents into a knowledge graph to improve schema linking and multihop reasoning.

What To Try In 7 Days

Optimization Features

Reproducibility

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Benchmarks

Context Entities

Models

Metrics

Datasets

Benchmarks

You May Also Want to Read

Turn an LLM output into a mini knowledge graph, check each fact with an NLI model, and get explainable hallucination flags

Key finding

Combine LLMs with a medical knowledge graph to get more accurate, verifiable scientific answers

Key finding

Use a personal causal graph so an LLM recommends foods that better lower your post-meal glucose

Key finding

A practical survey showing how knowledge graphs can make LLMs better at complex question answering

Key finding

MindMap: prompt LLMs with knowledge-graph evidence to produce explicit graph-style reasoning and reduce hallucination

Key finding