Overview
The system shows consistent, statistically significant accuracy gains on benchmarks, but raises token and engineering costs; adapt call limits and KG scope for production.
Citations0
Evidence Strength0.70
Confidence0.85
Risk Signals12
Trust Signals
Findings with numeric evidence: 4/4
Findings with evidence refs: 4/4
Results with explicit delta: 4/4
Reproducibility
Status: No open assets linked
Open source: Partial
At A Glance
Cost impact: 50%
Production readiness: 60%
Novelty: 60%
Why It Matters For Business
DataFactory trades higher query cost for much better accuracy and explainability on complex table queries, making it useful for teams that need reliable multi-hop analytics and traceable evidence from enterprise tables.
Who Should Care
Summary TLDR
DataFactory is a multi-agent TableQA system: a central Data Leader (using ReAct-style reasoning) coordinates a Database Team (SQL) and a Knowledge Graph Team (Cypher/Neo4j). It builds a knowledge graph from tables, stores historical QA examples in a vector DB for retrieval, and uses context-engineered prompts to reduce hallucination. Evaluated across TabFact, WikiTableQuestions, and FeTaQA with eight LLMs, DataFactory reports large average gains over baselines (≈+20.2% TabFact, +23.9% WikiTQ) while trading higher token cost for clearer, multi-step reasoning and explainable provenance.
Problem Statement
Current LLM-based TableQA struggles with limited context length, hallucinations, and weak multi-hop relational reasoning. Single-agent pipelines mix tasks (query generation, retrieval, analysis) and lack specialization, making complex table + relationship questions unreliable and hard to trace.
Main Contribution
A tripartite multi-agent architecture: Data Leader (planner), Database Team (SQL), and Knowledge Graph Team (Cypher/Neo4j) for complementary skills.
A formal data-to-knowledge-graph mapping (𝒯: D×S×R→G) and practical algorithms for entity extraction, ID generation, merging, and relationship discovery.
Key Findings
Multi-agent DataFactory significantly improves accuracy on standard TableQA benchmarks versus baselines.
Knowledge Graph team provides consistent gains when added to SQL-only pipelines.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Accuracy | 84.0% | other methods average | ↑20.2% vs baselines | TabFact | Table 3 shows DataFactory average 84.0% with +20.2% improvement | Table 3; RQ1 |
| Accuracy | 72.8% | other methods average | ↑23.9% vs baselines | WikiTableQuestions | Table 3 shows DataFactory average 72.8% with +23.9% improvement | Table 3; RQ1 |
What To Try In 7 Days
Run DataFactory on one critical table: compare SQL-only answers to DataFactory output for 20 representative queries.
Build a tiny knowledge graph (1–3 tables) and run a set of multi-hop questions to measure KG gains.
Log token use and set a default 1–3 call limit; measure accuracy vs cost to set a production stopping rule.
Agent Features
Memory
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Collaboration
Optimization Features
Token Efficiency
Infra Optimization
System Optimization
Inference Optimization
Reproducibility
Risks & Boundaries
Limitations
Higher token and interaction cost compared to prompt-only methods
Automated KG construction can fail or over-merge on noisy, inconsistent tables
When Not To Use
When single-step SQL answers are sufficient and latency/cost must be minimal
When strict data privacy forbids cloud LLM APIs and no local LLM is viable
Failure Modes
Excessive multi-agent calls causing error accumulation and performance collapse
Specification violations when role boundaries are not enforced

