Overview
The paper shows practical improvements on Spider using a clear pipeline, but evidence is limited to benchmark runs and a single ablation-like configuration.
Citations0
Evidence Strength0.60
Confidence0.72
Risk Signals8
Trust Signals
Findings with numeric evidence: 4/4
Findings with evidence refs: 4/4
Results with explicit delta: 8/8
Reproducibility
Status: Partial assets available
Open source: Partial
At A Glance
Cost impact: 70%
Production readiness: 60%
Novelty: 50%
Why It Matters For Business
AP-SQL offers a practical way to run reliable Text-to-SQL with smaller models and lower inference cost by pruning schema context, reusing examples, and using structured prompts.
Who Should Care
Summary TLDR
AP-SQL is a modular Text-to-SQL pipeline that targets low-resource settings. It fine-tunes a small Qwen model to filter schemas, retrieves Top-K example NL–SQL pairs (K=3) for in-context help, then uses prompt-driven schema linking and two prompt styles — Chain-of-Thought for simple queries and Graph-of-Thought for complex queries — to generate SQL. On the Spider benchmark AP-SQL yields small but consistent gains in Execution Accuracy (EX) and Test Suite (TS) over prior prompt-based systems across several LLMs. The system reduces prompt size by keeping only top-3 tables and top-3 columns per table and decouples schema linking from generation to lower inference cost.
Problem Statement
Text-to-SQL needs accurate schema grounding and reasoning, but deploying high-performing systems in constrained environments is hard: large closed models are costly and opaque, and small open models lack robust schema linking and multi-step reasoning.
Main Contribution
A modular pipeline that separates schema filtering, retrieval-augmented example prompting, schema linking, and final SQL generation.
A supervised fine-tuned Qwen model (reported as Qwen3B / Qwen-7B variants) used as a fast schema filter to select top-3 tables and top-3 columns per table, reducing prompt length.
Key Findings
AP-SQL gives consistent EX and TS gains on Spider across evaluated LLMs.
Smaller models also benefit, though gains are smaller.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Accuracy | 68.3% | E-SQL Qwen-7B 67.8% | +0.5% | Spider (eval set) | Table 1, Qwen-7B row | — |
| Accuracy | 60.8% | E-SQL Qwen-7B 60.4% | +0.4% | Spider (eval set) | Table 1, Qwen-7B row | — |
What To Try In 7 Days
Build a simple schema filter: fine-tune a small Qwen model on a few thousand annotated question-schema pairs and test top-3 table/column pruning.
Create a small NL–SQL example library and implement Top-K=3 retrieval to prepend to prompts.
Prototype CoT prompts for simple queries and a graph-style prompt for multi-table examples and compare EX/TS on a validation subset of Spider-like queries.
Optimization Features
Token Efficiency
Infra Optimization
Model Optimization
System Optimization
Training Optimization
Inference Optimization
Reproducibility
Data URLs
Risks & Boundaries
Limitations
Evaluation only on Spider; no cross-dataset generalization shown.
No public code or configuration links provided to reproduce results exactly.
When Not To Use
If you can run very large closed models directly and cost is not a concern.
If you need end-to-end learned parsers trained on paired SQL for specific production schema without prompt engineering.
Failure Modes
Schema filter misses relevant tables/columns and breaks final SQL.
Retrieved examples can mislead the model if example library is low quality.

