Compile NL queries into DAG plans to orchestrate parallel, auditable QA across SQL and vector stores

March 15, 20267 min

Overview

Decision SnapshotReady For Pilot

A.DOT is a clear engineering prototype: it shows strong gains on HybridQA dev with concrete modules for validation, caching, and remediation, but it remains a research prototype pending large-scale enterprise tests and user studies.

Citations0

Evidence Strength0.78

Confidence0.80

Risk Signals10

Trust Signals

Findings with numeric evidence: 3/3

Findings with evidence refs: 3/3

Results with explicit delta: 4/4

Reproducibility

Status: Partial assets available

Open source: Partial

At A Glance

Cost impact: 60%

Production readiness: 70%

Novelty: 66%

Authors

Kirushikesh D B, Manish Kesarwani, Nishtha Madaan, Sameep Mehta, Aldrin Dennis, Siddarth Ajay, Rakesh B R, Renu Rajagopal, Sudheesh Kairali

Links

Abstract / PDF / Data

Why It Matters For Business

A.DOT reduces over-retrieval and exposes verifiable evidence for each step, cutting unnecessary data exposure and improving multi-step QA accuracy—helpful for compliance-heavy enterprise queries.

Who Should Care

Summary TLDR

A.DOT compiles a user's natural-language question into a directed acyclic graph (DAG) of atomic subqueries that target either a relational database or a vector store. A single LLM pass generates the plan, which is validated, cached, and executed in parallel. The executor passes only needed IDs (not full payloads) between steps, records evidence trails, and invokes DataOps agents to diagnose and repair plan errors. On the HybridQA dev set, A.DOT improves answer correctness by 14.8% and completeness by 10.7% over a strong RAG baseline. The system is a research prototype under enterprise evaluation.

Problem Statement

Enterprises need natural-language question answering over mixed data lakes (relational tables + document vectors). Existing RAG or tool-calling pipelines over-retrieve, leak data, and fail at multi-hop queries that alternate between structured and unstructured sources. A practical system must plan precise cross-source queries, run them efficiently, and produce verifiable evidence.

Main Contribution

A.DOT: an agentic DAG planner that compiles NL queries into atomic, source-targeted subqueries and executes them according to DAG dependencies.

Schema-aware plan validation plus a DataOps loop (diagnose, fix, replan) to detect and repair plan or execution errors.

Key Findings

A.DOT improves answer correctness and completeness on HybridQA dev vs Standard RAG.

NumbersCorrectness +14.8p, Completeness +10.7p (Table 1)

Practical UseIf you convert mixed data into relational + vector stores, using DAG planning with validation can materially raise multi-hop QA quality versus single-pass RAG.

Evidence RefTable 1

DataOps system is critical—removing it drops correctness sharply.

NumbersCorrectness 60.0% without DataOps vs 71.8% full A.DOT (Table 2)

Practical UseDeploy a remediation loop (diagnose/fix/replan) to recover from runtime and plan errors; validation alone is insufficient.

Evidence RefTable 2

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Answer CorrectnessA.DOT 71.0Standard RAG 56.2+14.8HybridQA devTable 1: A.DOT vs baselinesTable 1
Answer CompletenessA.DOT 73.0Standard RAG 62.3+10.7HybridQA devTable 1: A.DOT vs baselinesTable 1

What To Try In 7 Days

Prototype a DAG plan for 5 common cross-source NL queries in your data lake.

Add a light schema check that rejects plans referencing missing columns.

Log variable bindings (IDs) instead of full payloads when passing results across steps to reduce data leakage and token use.

Agent Features

Memory
paraphrase-aware plan cache (LRU)variable-binding store for intermediate IDs
Planning
DAG-based planningsingle-LLM plan generationparaphrase-aware plan cachingparallel execution of independent nodes
Tool Use
NL-to-SQL toolNL-to-Vector tool (Milvus)LLM-as-plannerLLM-as-judge
Frameworks
LangGraphLLaMA-3-70B
Is Agentic

Yes

Architectures
DAG planneragentic modular pipeline (PlanGen, Validator, DataOps, Executor)
Collaboration
DataOps agents (Diagnoser, Fixer, Replanner)concurrent node execution with dependency coordination

Optimization Features

Token Efficiency
propagate minimal keys/IDs instead of full text payloads
System Optimization
schema validation avoids costly failed executionsparaphrase-aware template matching speeds re-execution
Inference Optimization
parallel node execution reduces end-to-end latencyplan caching avoids regenerating DAGs

Reproducibility

Code AvailableNo
Data AvailableYes
Open Source StatusPartial
LicenseUnknown

Data URLs

HybridQA (Chen et al. 2020)

Risks & Boundaries

Limitations

Evaluation limited to the HybridQA dev split; production datasets not yet reported.

Prototype relies on LLaMA-3-70B, which is compute-intensive for real-time deployment.

When Not To Use

Simple single-source Q&A where single-pass RAG already suffices.

Very low-compute environments that cannot host 70B models.

Failure Modes

Intermediate subquery failures (NL-to-SQL or NL-to-Vector modules fail).

Incorrect data-source assignment by the planner.

Core Entities

Models

LLaMA-3-70BMistral Large 2 (LLM-as-judge)

Metrics

Answer Correctness (LLM-as-judge)Answer Completeness (graded Very Bad–Excellent)

Datasets

HybridQA (dev split, 3,466 QA pairs)

Benchmarks

HybridQA