Overview
The system integrates with common inference engines and uses public benchmarks. Results show strong latency reduction and ablations that explain gains, but evaluation is limited to Text-to‑SQL workloads and tuned models.
Citations0
Evidence Strength0.80
Confidence0.80
Risk Signals9
Trust Signals
Findings with numeric evidence: 5/5
Findings with evidence refs: 5/5
Results with explicit delta: 3/3
Reproducibility
Status: Partial assets available
Open source: Unknown
At A Glance
Cost impact: 70%
Production readiness: 70%
Novelty: 60%
Why It Matters For Business
TableCache cuts Text-to‑SQL response latency by precomputing and reusing table caches, improving user experience and lowering repeated GPU compute costs in applications where users query shared tables.
Who Should Care
Summary TLDR
TableCache precomputes key-value (KV) cache entries per database table offline, preserving primary–foreign key (PFK) attention between related tables. It stores these table caches in a CPU-resident Table Trie and loads only the needed caches into GPU at inference. Combined with query reranking (to group similar table accesses) and a CPU→GPU prefetch pipeline, TableCache reduces Time To First Token (TTFT) substantially (reported up to 3.62×) while keeping SQL execution accuracy nearly unchanged. The method integrates with common serving engines (vLLM, SGLang) and targets Text-to‑SQL workloads with repeated table access patterns.
Problem Statement
LLM-based Text-to‑SQL systems include large database schemas in prompts. This makes prefill (prefix) computation long and slow. Existing KV cache reuse needs exact prefix matches and fails when table order varies, causing redundant cache recomputation and high latency. The paper seeks a practical way to reuse table-level cache across queries while retaining inter-table attention.
Main Contribution
TableCache: offline precomputation of per-table KV caches that preserve primary–foreign key attention relationships.
Table Trie: a token-level trie for fast table-name matching to retrieve precomputed table caches at inference.
Key Findings
TableCache greatly reduces prefix latency (TTFT) on Text-to‑SQL benchmarks.
Accuracy is preserved for tuned Text‑to‑SQL models.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| TTFT (Time To First Token) | 36.23 s (TableCache on Spider dev) | ≈98.13 s (baseline transformers on Spider dev) | ≈ -61.9 s (~2.7–3.6× reduction vs baselines) | Spider dev | Table 2; Sec.5.3 | Table 2, Sec.5.3 |
| Accuracy | 76.9% (TableCache on Spider, tuned backbone) | 72.0% (w/o PFK-guided representation in ablation) | +4.9 pp (absolute) | Spider (PFTR ablation, Table 3) | Table 3 (PFTR ablation) | Table 3 |
What To Try In 7 Days
Profile your Text-to‑SQL traffic to find hot tables and repeated table sets.
Precompute table-level KV caches and store them in CPU memory for quick lookup.
Build a simple Table Trie to map tokenized queries to table IDs and fetch caches by match order.
Optimization Features
Token Efficiency
Infra Optimization
Model Optimization
System Optimization
Training Optimization
Inference Optimization
Reproducibility
Data URLs
Risks & Boundaries
Limitations
Designed and evaluated only for Text‑to‑SQL; extension to unstructured QA is non-trivial.
Requires static or slowly changing schemas; frequent schema changes raise precompute cost.
When Not To Use
When schemas change rapidly and precomputation cannot be amortized.
For open-ended language tasks without clear table boundaries.
Failure Modes
If primary–foreign key graph is incomplete, inter-table attention reconstruction may be incorrect.
Mismatched tokenization or table-naming variants can reduce Table Trie match rates and lower hit ratio.

