Compile NL queries into DAG plans to orchestrate parallel, auditable QA across SQL and vector stores

March 15, 20267 min

Overview

Production Readiness

0.7

Novelty Score

0.66

Cost Impact Score

0.6

Citation Count

0

Authors

Kirushikesh D B, Manish Kesarwani, Nishtha Madaan, Sameep Mehta, Aldrin Dennis, Siddarth Ajay, Rakesh B R, Renu Rajagopal, Sudheesh Kairali

Links

Abstract / PDF

Why It Matters For Business

A.DOT reduces over-retrieval and exposes verifiable evidence for each step, cutting unnecessary data exposure and improving multi-step QA accuracy—helpful for compliance-heavy enterprise queries.

Summary TLDR

A.DOT compiles a user's natural-language question into a directed acyclic graph (DAG) of atomic subqueries that target either a relational database or a vector store. A single LLM pass generates the plan, which is validated, cached, and executed in parallel. The executor passes only needed IDs (not full payloads) between steps, records evidence trails, and invokes DataOps agents to diagnose and repair plan errors. On the HybridQA dev set, A.DOT improves answer correctness by 14.8% and completeness by 10.7% over a strong RAG baseline. The system is a research prototype under enterprise evaluation.

Problem Statement

Enterprises need natural-language question answering over mixed data lakes (relational tables + document vectors). Existing RAG or tool-calling pipelines over-retrieve, leak data, and fail at multi-hop queries that alternate between structured and unstructured sources. A practical system must plan precise cross-source queries, run them efficiently, and produce verifiable evidence.

Main Contribution

A.DOT: an agentic DAG planner that compiles NL queries into atomic, source-targeted subqueries and executes them according to DAG dependencies.

Schema-aware plan validation plus a DataOps loop (diagnose, fix, replan) to detect and repair plan or execution errors.

Paraphrase-aware plan caching and template caching to reuse prior plans and speed repeated queries.

Parallel execution of independent DAG nodes with variable-binding that passes minimal identifiers to reduce payload and leakage.

Plan-driven evidence and lineage logging so every intermediate step can be audited and verified.

Key Findings

A.DOT improves answer correctness and completeness on HybridQA dev vs Standard RAG.

NumbersCorrectness +14.8p, Completeness +10.7p (Table 1)

DataOps system is critical—removing it drops correctness sharply.

NumbersCorrectness 60.0% without DataOps vs 71.8% full A.DOT (Table 2)

Plan validation + DataOps together outperform disabling both.

NumbersAblation: no validator+DataOps = 67.9% vs A.DOT = 71.8% (Table 2)

Results

Answer Correctness

ValueA.DOT 71.0

BaselineStandard RAG 56.2

Answer Completeness

ValueA.DOT 73.0

BaselineStandard RAG 62.3

Ablation - Answer Correctness

ValueA.DOT 71.8

BaselineWithout DataOps 60.0

Ablation - Answer Completeness

ValueA.DOT 74.3

BaselineWithout DataOps 61.8

Who Should Care

What To Try In 7 Days

Prototype a DAG plan for 5 common cross-source NL queries in your data lake.

Add a light schema check that rejects plans referencing missing columns.

Log variable bindings (IDs) instead of full payloads when passing results across steps to reduce data leakage and token use.

Agent Features

Memory

  • paraphrase-aware plan cache (LRU)
  • variable-binding store for intermediate IDs

Planning

  • DAG-based planning
  • single-LLM plan generation
  • paraphrase-aware plan caching
  • parallel execution of independent nodes

Tool Use

  • NL-to-SQL tool
  • NL-to-Vector tool (Milvus)
  • LLM-as-planner
  • LLM-as-judge

Frameworks

  • LangGraph
  • LLaMA-3-70B

Is Agentic

true

Architectures

  • DAG planner
  • agentic modular pipeline (PlanGen, Validator, DataOps, Executor)

Collaboration

  • DataOps agents (Diagnoser, Fixer, Replanner)
  • concurrent node execution with dependency coordination

Optimization Features

Token Efficiency

  • propagate minimal keys/IDs instead of full text payloads

System Optimization

  • schema validation avoids costly failed executions
  • paraphrase-aware template matching speeds re-execution

Inference Optimization

  • parallel node execution reduces end-to-end latency
  • plan caching avoids regenerating DAGs

Reproducibility

Data Urls

  • HybridQA (Chen et al. 2020)

Data Available

Open Source Status

  • partial

Risks & Boundaries

Limitations

  • Evaluation limited to the HybridQA dev split; production datasets not yet reported.
  • Prototype relies on LLaMA-3-70B, which is compute-intensive for real-time deployment.
  • Plan caching effectiveness not quantified on diverse enterprise workloads within this paper.

When Not To Use

  • Simple single-source Q&A where single-pass RAG already suffices.
  • Very low-compute environments that cannot host 70B models.
  • Corpora without reliable cross-links between structured rows and document IDs.

Failure Modes

  • Intermediate subquery failures (NL-to-SQL or NL-to-Vector modules fail).
  • Incorrect data-source assignment by the planner.
  • Insufficient commonsense knowledge across hops leading to wrong joins.
  • Plan Validator may halt execution; without DataOps it prevents recovery.

Core Entities

Models

  • LLaMA-3-70B
  • Mistral Large 2 (LLM-as-judge)

Metrics

  • Answer Correctness (LLM-as-judge)
  • Answer Completeness (graded Very Bad–Excellent)

Datasets

  • HybridQA (dev split, 3,466 QA pairs)

Benchmarks

  • HybridQA