Overview
Production Readiness
0.7
Novelty Score
0.66
Cost Impact Score
0.6
Citation Count
0
Why It Matters For Business
A.DOT reduces over-retrieval and exposes verifiable evidence for each step, cutting unnecessary data exposure and improving multi-step QA accuracy—helpful for compliance-heavy enterprise queries.
Summary TLDR
A.DOT compiles a user's natural-language question into a directed acyclic graph (DAG) of atomic subqueries that target either a relational database or a vector store. A single LLM pass generates the plan, which is validated, cached, and executed in parallel. The executor passes only needed IDs (not full payloads) between steps, records evidence trails, and invokes DataOps agents to diagnose and repair plan errors. On the HybridQA dev set, A.DOT improves answer correctness by 14.8% and completeness by 10.7% over a strong RAG baseline. The system is a research prototype under enterprise evaluation.
Problem Statement
Enterprises need natural-language question answering over mixed data lakes (relational tables + document vectors). Existing RAG or tool-calling pipelines over-retrieve, leak data, and fail at multi-hop queries that alternate between structured and unstructured sources. A practical system must plan precise cross-source queries, run them efficiently, and produce verifiable evidence.
Main Contribution
A.DOT: an agentic DAG planner that compiles NL queries into atomic, source-targeted subqueries and executes them according to DAG dependencies.
Schema-aware plan validation plus a DataOps loop (diagnose, fix, replan) to detect and repair plan or execution errors.
Paraphrase-aware plan caching and template caching to reuse prior plans and speed repeated queries.
Parallel execution of independent DAG nodes with variable-binding that passes minimal identifiers to reduce payload and leakage.
Plan-driven evidence and lineage logging so every intermediate step can be audited and verified.
Key Findings
A.DOT improves answer correctness and completeness on HybridQA dev vs Standard RAG.
DataOps system is critical—removing it drops correctness sharply.
Plan validation + DataOps together outperform disabling both.
Results
Answer Correctness
Answer Completeness
Ablation - Answer Correctness
Ablation - Answer Completeness
Who Should Care
What To Try In 7 Days
Prototype a DAG plan for 5 common cross-source NL queries in your data lake.
Add a light schema check that rejects plans referencing missing columns.
Log variable bindings (IDs) instead of full payloads when passing results across steps to reduce data leakage and token use.
Agent Features
Memory
- paraphrase-aware plan cache (LRU)
- variable-binding store for intermediate IDs
Planning
- DAG-based planning
- single-LLM plan generation
- paraphrase-aware plan caching
- parallel execution of independent nodes
Tool Use
- NL-to-SQL tool
- NL-to-Vector tool (Milvus)
- LLM-as-planner
- LLM-as-judge
Frameworks
- LangGraph
- LLaMA-3-70B
Is Agentic
true
Architectures
- DAG planner
- agentic modular pipeline (PlanGen, Validator, DataOps, Executor)
Collaboration
- DataOps agents (Diagnoser, Fixer, Replanner)
- concurrent node execution with dependency coordination
Optimization Features
Token Efficiency
- propagate minimal keys/IDs instead of full text payloads
System Optimization
- schema validation avoids costly failed executions
- paraphrase-aware template matching speeds re-execution
Inference Optimization
- parallel node execution reduces end-to-end latency
- plan caching avoids regenerating DAGs
Reproducibility
Data Urls
- HybridQA (Chen et al. 2020)
Data Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Evaluation limited to the HybridQA dev split; production datasets not yet reported.
- Prototype relies on LLaMA-3-70B, which is compute-intensive for real-time deployment.
- Plan caching effectiveness not quantified on diverse enterprise workloads within this paper.
When Not To Use
- Simple single-source Q&A where single-pass RAG already suffices.
- Very low-compute environments that cannot host 70B models.
- Corpora without reliable cross-links between structured rows and document IDs.
Failure Modes
- Intermediate subquery failures (NL-to-SQL or NL-to-Vector modules fail).
- Incorrect data-source assignment by the planner.
- Insufficient commonsense knowledge across hops leading to wrong joins.
- Plan Validator may halt execution; without DataOps it prevents recovery.
Core Entities
Models
- LLaMA-3-70B
- Mistral Large 2 (LLM-as-judge)
Metrics
- Answer Correctness (LLM-as-judge)
- Answer Completeness (graded Very Bad–Excellent)
Datasets
- HybridQA (dev split, 3,466 QA pairs)
Benchmarks
- HybridQA

