An open-source agent that switches between graph and vector search to improve literature review accuracy

Overview

Decision SnapshotNeeds Validation

Open-source pipeline and synthetic benchmark show promising gains. Evidence is limited to a small, synthetic test and bootstrapped estimates; expect additional engineering (Cypher translation, OCR, domain tuning) before deployment in high-stakes settings.

Citations0

Evidence Strength0.60

Confidence0.80

Risk Signals11

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 11/11

Reproducibility

Status: Partial assets available

Open source: Yes

At A Glance

Cost impact: 50%

Production readiness: 70%

Novelty: 60%

Authors

Aditya Nagori, Ricardo Accorsi Casonatto, Ayush Gautam, Abhinav Manikantha Sai Cheruvu, Rishikesan Kamaleswaran

Links

Abstract / PDF / Code

Why It Matters For Business

Automating literature review with a system that picks the right retrieval mode reduces manual search time and improves the relevance of extracted evidence. This matters for teams that need fast, evidence-grounded summaries across many papers (R&D, clinical review, IP) and want an auditable pipeline.

Who Should Care

Product Manager ML Engineer Founder Data Scientist

Summary TLDR

The authors built and open-sourced an agentic Retrieval-Augmented Generation (RAG) system that stores literature in both a Neo4j knowledge graph and a FAISS vector store, and dynamically picks GraphRAG or VectorRAG per query. Instruction tuning plus Direct Preference Optimization (DPO) improves grounding and retrieval. On a synthetic scientific benchmark the agentic+DPO setup raised vector-store context recall by +0.63 and overall context precision by +0.56 versus a non-agentic baseline. Code is on GitHub.

Problem Statement

Static RAG pipelines (one fixed retrieval path) miss many scientific information needs. Researchers need a system that combines structured metadata (citations, authors) and full-text semantics, picks the right retrieval strategy per question, and reports uncertainty.

Main Contribution

An open-source Python pipeline that ingests PubMed/ArXiv/Google Scholar, builds a Neo4j knowledge graph and a FAISS vector store of full-text chunks.

An agentic orchestration layer (LLaMA-3.3-70B-versatile) that dynamically selects between GraphRAG (Cypher queries) and VectorRAG (BM25 + dense search + reranker) per prompt.

Key Findings

Agentic system with DPO substantially increases vector-store context recall.

NumbersVS Context Recall +0.63 vs baseline

Practical UseIf you add agentic routing plus DPO, expect substantially better recall for answers that require full-text retrieval. Practical step: route queries likely needing semantic content to VectorRAG.

Evidence RefAbstract; Results section; Figure 5

Overall context precision improves meaningfully under agentic+DPO control.

NumbersOverall Context Precision +0.56 vs baseline

Practical UseDynamic retrieval selection cuts irrelevant context returned. Use agentic selection to improve the relevance of evidence shown to users.

Evidence RefAbstract; Results section

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
VS Context Recall	+0.63	Non-agentic RAG	+0.63	Synthetic benchmark (VectorRAG items)	Results section; Abstract	Results; Figure 5
Overall Context Precision	+0.56	Non-agentic RAG	+0.56	Synthetic benchmark (combined)	Results section; Abstract	Results

What To Try In 7 Days

Clone the repo and run the pipeline on a small topic using PubMed/ArXiv API keys to see ingest and KG/VS construction.

Compare answers for a handful of domain questions between a static vector-search pipeline and the agentic pipeline to observe differences in retrieved evidence.

Add 10–20 human preference pairs (DPO style) for your domain to quickly test gains in faithfulness.

Agent Features

Memory

Retrieval memory only: Neo4j KG (structured metadata) and FAISS VS (embedded full text)

Planning

Dynamic selection of retrieval mode per queryFew-shot examples guide NL→Cypher translation and tool choiceDecompose user query into tool calls

Tool Use

Cypher queries over Neo4j (GraphRAG)BM25 + dense embeddings + FAISS + reranker (VectorRAG)Mistral-7B-Instruct for generationCohere reranker for passage re-ranking

Frameworks

Neo4jFAISSDockerGitHub pipeline

Is Agentic

Yes

Architectures

LLM-based planner (LLaMA-3.3-70B-versatile)Tool-calling workflow (GraphRAG and VectorRAG functions)

Collaboration

Supports human-in-the-loop review; encourages oversight for low-confidence outputs

Optimization Features

Token Efficiency

Chunking text into 2024-character segments with 50-character overlap (reduces redundant context)

Infra Optimization

GPU acceleration reduces latency from ~2 minutes on consumer hardware to ~10 seconds on server GPUs

System Optimization

Dockerizable pipeline for reproducible deployment

Training Optimization

Instruction tuning of Mistral-7B-InstructDirect Preference Optimization (DPO) with 15 preference pairs

Inference Optimization

Agentic routing to pick the most suitable retriever per queryEnsemble retrieval and reranking to prioritize high-quality passages

Reproducibility

Code AvailableYes

Data AvailableNo

Open Source StatusYes

LicenseUnknown

Code URLs

https://github.com/Kamaleswaran-Lab/Agentic-Hybrid-Rag

Risks & Boundaries

Limitations

Evaluation relies on a synthetic benchmark of 40 QA pairs; may not reflect complex real-world literature queries.

NL→Cypher translation uses few-shot prompting and can mis-translate complex queries.

When Not To Use

Do not rely on this system where absolute formal guarantees are required (legal/clinical decisions) without human verification.

Avoid for corpora composed mostly of scanned PDFs until OCR is integrated.

Failure Modes

Mistaken Cypher generation leads to wrong or missing KG answers.

If retrieval fails (both KG and VS), the generator can hallucinate despite DPO.

Core Entities

Models

LLaMA-3.3-70B-versatileMistral-7B-Instruct-v0.3all-MiniLM-L6-v2Cohere rerank-english-v3.0

Metrics

FaithfulnessAnswer relevanceContext precisionContext recall

Datasets

PubMed (via API)ArXiv (via API)Google Scholar (via API)Synthetic RAG benchmark (40 QA pairs)

Benchmarks

Custom synthetic VectorRAG/GraphRAG benchmark (20 VS Q, 20 KG Q)

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

Agentic system with DPO substantially increases vector-store context recall.

Overall context precision improves meaningfully under agentic+DPO control.

Results

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Code URLs

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Benchmarks

You May Also Want to Read

Chemistry foundation models power structure-focused multimodal RAG inside hierarchical multi-agent workflows

Key finding

Create, customize, and run multi-step LLM agents from plain language — no code needed

Key finding

COMPASS: a multi-agent orchestration that uses RAG and an LLM-as-judge to enforce sovereignty, carbon-awareness, compliance, and ethics in实时

Key finding

AgentAuditor: memory‑augmented RAG + CoT that makes LLM evaluators reach human-level accuracy on agent safety

Key finding

Use multi-agent RAG plus a hybrid vector-graph memory to auto-generate traceable test plans and cases, cutting test-document work by ~85% in

Key finding