Compile NL queries into DAG plans to orchestrate parallel, auditable QA across SQL and vector stores

Overview

Decision SnapshotReady For Pilot

A.DOT is a clear engineering prototype: it shows strong gains on HybridQA dev with concrete modules for validation, caching, and remediation, but it remains a research prototype pending large-scale enterprise tests and user studies.

Citations0

Evidence Strength0.78

Confidence0.80

Risk Signals10

Trust Signals

Findings with numeric evidence: 3/3

Findings with evidence refs: 3/3

Results with explicit delta: 4/4

Reproducibility

Status: Partial assets available

Open source: Partial

At A Glance

Cost impact: 60%

Production readiness: 70%

Novelty: 66%

Authors

Kirushikesh D B, Manish Kesarwani, Nishtha Madaan, Sameep Mehta, Aldrin Dennis, Siddarth Ajay, Rakesh B R, Renu Rajagopal, Sudheesh Kairali

Links

Abstract / PDF / Data

Why It Matters For Business

A.DOT reduces over-retrieval and exposes verifiable evidence for each step, cutting unnecessary data exposure and improving multi-step QA accuracy—helpful for compliance-heavy enterprise queries.

Who Should Care

Product Manager ML Engineer Data Scientist Engineering Lead CTO

Summary TLDR

A.DOT compiles a user's natural-language question into a directed acyclic graph (DAG) of atomic subqueries that target either a relational database or a vector store. A single LLM pass generates the plan, which is validated, cached, and executed in parallel. The executor passes only needed IDs (not full payloads) between steps, records evidence trails, and invokes DataOps agents to diagnose and repair plan errors. On the HybridQA dev set, A.DOT improves answer correctness by 14.8% and completeness by 10.7% over a strong RAG baseline. The system is a research prototype under enterprise evaluation.

Problem Statement

Enterprises need natural-language question answering over mixed data lakes (relational tables + document vectors). Existing RAG or tool-calling pipelines over-retrieve, leak data, and fail at multi-hop queries that alternate between structured and unstructured sources. A practical system must plan precise cross-source queries, run them efficiently, and produce verifiable evidence.

Main Contribution

A.DOT: an agentic DAG planner that compiles NL queries into atomic, source-targeted subqueries and executes them according to DAG dependencies.

Schema-aware plan validation plus a DataOps loop (diagnose, fix, replan) to detect and repair plan or execution errors.

Key Findings

A.DOT improves answer correctness and completeness on HybridQA dev vs Standard RAG.

NumbersCorrectness +14.8p, Completeness +10.7p (Table 1)

Practical UseIf you convert mixed data into relational + vector stores, using DAG planning with validation can materially raise multi-hop QA quality versus single-pass RAG.

Evidence RefTable 1

DataOps system is critical—removing it drops correctness sharply.

NumbersCorrectness 60.0% without DataOps vs 71.8% full A.DOT (Table 2)

Practical UseDeploy a remediation loop (diagnose/fix/replan) to recover from runtime and plan errors; validation alone is insufficient.

Evidence RefTable 2

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
Answer Correctness	A.DOT 71.0	Standard RAG 56.2	+14.8	HybridQA dev	Table 1: A.DOT vs baselines	Table 1
Answer Completeness	A.DOT 73.0	Standard RAG 62.3	+10.7	HybridQA dev	Table 1: A.DOT vs baselines	Table 1

What To Try In 7 Days

Prototype a DAG plan for 5 common cross-source NL queries in your data lake.

Add a light schema check that rejects plans referencing missing columns.

Log variable bindings (IDs) instead of full payloads when passing results across steps to reduce data leakage and token use.

Agent Features

Memory

paraphrase-aware plan cache (LRU)variable-binding store for intermediate IDs

Planning

DAG-based planningsingle-LLM plan generationparaphrase-aware plan cachingparallel execution of independent nodes

Tool Use

NL-to-SQL toolNL-to-Vector tool (Milvus)LLM-as-plannerLLM-as-judge

Frameworks

LangGraphLLaMA-3-70B

Is Agentic

Yes

Architectures

DAG planneragentic modular pipeline (PlanGen, Validator, DataOps, Executor)

Collaboration

DataOps agents (Diagnoser, Fixer, Replanner)concurrent node execution with dependency coordination

Optimization Features

Token Efficiency

propagate minimal keys/IDs instead of full text payloads

System Optimization

schema validation avoids costly failed executionsparaphrase-aware template matching speeds re-execution

Inference Optimization

parallel node execution reduces end-to-end latencyplan caching avoids regenerating DAGs

Reproducibility

Code AvailableNo

Data AvailableYes

Open Source StatusPartial

LicenseUnknown

Data URLs

HybridQA (Chen et al. 2020)

Risks & Boundaries

Limitations

Evaluation limited to the HybridQA dev split; production datasets not yet reported.

Prototype relies on LLaMA-3-70B, which is compute-intensive for real-time deployment.

When Not To Use

Simple single-source Q&A where single-pass RAG already suffices.

Very low-compute environments that cannot host 70B models.

Failure Modes

Intermediate subquery failures (NL-to-SQL or NL-to-Vector modules fail).

Incorrect data-source assignment by the planner.

Core Entities

Models

LLaMA-3-70BMistral Large 2 (LLM-as-judge)

Metrics

Answer Correctness (LLM-as-judge)Answer Completeness (graded Very Bad–Excellent)

Datasets

HybridQA (dev split, 3,466 QA pairs)

Benchmarks

HybridQA

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

A.DOT improves answer correctness and completeness on HybridQA dev vs Standard RAG.

DataOps system is critical—removing it drops correctness sharply.

Results

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Data URLs

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Benchmarks

You May Also Want to Read

Argues that 'agentic' buzzwords mostly rebrand decades-old agent and multi-agent research

Key finding

Create, customize, and run multi-step LLM agents from plain language — no code needed

Key finding

COMPASS: a multi-agent orchestration that uses RAG and an LLM-as-judge to enforce sovereignty, carbon-awareness, compliance, and ethics in实时

Key finding

RAPS: intent-driven, reputation-aware publish–subscribe for adaptive multi-agent LLM coordination

Key finding

ACP: a layered, federated protocol for secure cross-platform agent-to-agent collaboration

Key finding