AP-SQL: combine a small fine-tuned schema filter, example retrieval, and thought-style prompts to run Text-to-SQL with lower cost

June 4, 20256 min

Overview

Decision SnapshotNeeds Validation

The paper shows practical improvements on Spider using a clear pipeline, but evidence is limited to benchmark runs and a single ablation-like configuration.

Citations0

Evidence Strength0.60

Confidence0.72

Risk Signals8

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 8/8

Reproducibility

Status: Partial assets available

Open source: Partial

At A Glance

Cost impact: 70%

Production readiness: 60%

Novelty: 50%

Authors

Zetong Tang, Qian Ma, Di Wu

Links

Abstract / PDF / Data

Why It Matters For Business

AP-SQL offers a practical way to run reliable Text-to-SQL with smaller models and lower inference cost by pruning schema context, reusing examples, and using structured prompts.

Who Should Care

Summary TLDR

AP-SQL is a modular Text-to-SQL pipeline that targets low-resource settings. It fine-tunes a small Qwen model to filter schemas, retrieves Top-K example NL–SQL pairs (K=3) for in-context help, then uses prompt-driven schema linking and two prompt styles — Chain-of-Thought for simple queries and Graph-of-Thought for complex queries — to generate SQL. On the Spider benchmark AP-SQL yields small but consistent gains in Execution Accuracy (EX) and Test Suite (TS) over prior prompt-based systems across several LLMs. The system reduces prompt size by keeping only top-3 tables and top-3 columns per table and decouples schema linking from generation to lower inference cost.

Problem Statement

Text-to-SQL needs accurate schema grounding and reasoning, but deploying high-performing systems in constrained environments is hard: large closed models are costly and opaque, and small open models lack robust schema linking and multi-step reasoning.

Main Contribution

A modular pipeline that separates schema filtering, retrieval-augmented example prompting, schema linking, and final SQL generation.

A supervised fine-tuned Qwen model (reported as Qwen3B / Qwen-7B variants) used as a fast schema filter to select top-3 tables and top-3 columns per table, reducing prompt length.

Key Findings

AP-SQL gives consistent EX and TS gains on Spider across evaluated LLMs.

NumbersGPT-4o: EX 89.7% vs E-SQL 88.6% (+1.1); TS 82.6% vs 79.4% (+3.2)

Practical UseIf you must run Text-to-SQL with an LLM, AP-SQL can raise correctness slightly and reduce test-suite failures, especially improving robustness measured by TS.

Evidence RefTable 1 (Results section)

Smaller models also benefit, though gains are smaller.

NumbersQwen-7B: EX 68.3% vs E-SQL 67.8% (+0.5); TS 60.8% vs 60.4% (+0.4)

Practical UseTeams using mid-size open models can get modest accuracy gains without moving to much larger costly models.

Evidence RefTable 1 (Results section)

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Accuracy68.3%E-SQL Qwen-7B 67.8%+0.5%Spider (eval set)Table 1, Qwen-7B row
Accuracy60.8%E-SQL Qwen-7B 60.4%+0.4%Spider (eval set)Table 1, Qwen-7B row

What To Try In 7 Days

Build a simple schema filter: fine-tune a small Qwen model on a few thousand annotated question-schema pairs and test top-3 table/column pruning.

Create a small NL–SQL example library and implement Top-K=3 retrieval to prepend to prompts.

Prototype CoT prompts for simple queries and a graph-style prompt for multi-table examples and compare EX/TS on a validation subset of Spider-like queries.

Optimization Features

Token Efficiency
Context CompressionToken Budgeting
Infra Optimization
fits on 2x4090 GPUs
Model Optimization
efficient_finetuning
System Optimization
RAG with Top-K examples
Training Optimization
supervised_finetuning_small_model
Inference Optimization
decoupled_schema_linkingprompt_pruning

Reproducibility

Code AvailableNo
Data AvailableYes
Open Source StatusPartial
LicenseUnknown

Data URLs

Spider dataset (public benchmark)

Risks & Boundaries

Limitations

Evaluation only on Spider; no cross-dataset generalization shown.

No public code or configuration links provided to reproduce results exactly.

When Not To Use

If you can run very large closed models directly and cost is not a concern.

If you need end-to-end learned parsers trained on paired SQL for specific production schema without prompt engineering.

Failure Modes

Schema filter misses relevant tables/columns and breaks final SQL.

Retrieved examples can mislead the model if example library is low quality.

Core Entities

Models

Qwen-7BQwen3B (fine-tuned filter)Llama-8BGPT-4o-miniGPT-4o

Metrics

Accuracy

Datasets

Spider

Benchmarks

Spider

Context Entities

Models

E-SQLACT-SQLC3-SQLDIN-SQL