An open-source agent that switches between graph and vector search to improve literature review accuracy

July 30, 20259 min

Overview

Decision SnapshotNeeds Validation

Open-source pipeline and synthetic benchmark show promising gains. Evidence is limited to a small, synthetic test and bootstrapped estimates; expect additional engineering (Cypher translation, OCR, domain tuning) before deployment in high-stakes settings.

Citations0

Evidence Strength0.60

Confidence0.80

Risk Signals11

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 11/11

Reproducibility

Status: Partial assets available

Open source: Yes

At A Glance

Cost impact: 50%

Production readiness: 70%

Novelty: 60%

Authors

Aditya Nagori, Ricardo Accorsi Casonatto, Ayush Gautam, Abhinav Manikantha Sai Cheruvu, Rishikesan Kamaleswaran

Links

Abstract / PDF / Code

Why It Matters For Business

Automating literature review with a system that picks the right retrieval mode reduces manual search time and improves the relevance of extracted evidence. This matters for teams that need fast, evidence-grounded summaries across many papers (R&D, clinical review, IP) and want an auditable pipeline.

Who Should Care

Summary TLDR

The authors built and open-sourced an agentic Retrieval-Augmented Generation (RAG) system that stores literature in both a Neo4j knowledge graph and a FAISS vector store, and dynamically picks GraphRAG or VectorRAG per query. Instruction tuning plus Direct Preference Optimization (DPO) improves grounding and retrieval. On a synthetic scientific benchmark the agentic+DPO setup raised vector-store context recall by +0.63 and overall context precision by +0.56 versus a non-agentic baseline. Code is on GitHub.

Problem Statement

Static RAG pipelines (one fixed retrieval path) miss many scientific information needs. Researchers need a system that combines structured metadata (citations, authors) and full-text semantics, picks the right retrieval strategy per question, and reports uncertainty.

Main Contribution

An open-source Python pipeline that ingests PubMed/ArXiv/Google Scholar, builds a Neo4j knowledge graph and a FAISS vector store of full-text chunks.

An agentic orchestration layer (LLaMA-3.3-70B-versatile) that dynamically selects between GraphRAG (Cypher queries) and VectorRAG (BM25 + dense search + reranker) per prompt.

Key Findings

Agentic system with DPO substantially increases vector-store context recall.

NumbersVS Context Recall +0.63 vs baseline

Practical UseIf you add agentic routing plus DPO, expect substantially better recall for answers that require full-text retrieval. Practical step: route queries likely needing semantic content to VectorRAG.

Evidence RefAbstract; Results section; Figure 5

Overall context precision improves meaningfully under agentic+DPO control.

NumbersOverall Context Precision +0.56 vs baseline

Practical UseDynamic retrieval selection cuts irrelevant context returned. Use agentic selection to improve the relevance of evidence shown to users.

Evidence RefAbstract; Results section

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
VS Context Recall+0.63Non-agentic RAG+0.63Synthetic benchmark (VectorRAG items)Results section; AbstractResults; Figure 5
Overall Context Precision+0.56Non-agentic RAG+0.56Synthetic benchmark (combined)Results section; AbstractResults

What To Try In 7 Days

Clone the repo and run the pipeline on a small topic using PubMed/ArXiv API keys to see ingest and KG/VS construction.

Compare answers for a handful of domain questions between a static vector-search pipeline and the agentic pipeline to observe differences in retrieved evidence.

Add 10–20 human preference pairs (DPO style) for your domain to quickly test gains in faithfulness.

Agent Features

Memory
Retrieval memory only: Neo4j KG (structured metadata) and FAISS VS (embedded full text)
Planning
Dynamic selection of retrieval mode per queryFew-shot examples guide NL→Cypher translation and tool choiceDecompose user query into tool calls
Tool Use
Cypher queries over Neo4j (GraphRAG)BM25 + dense embeddings + FAISS + reranker (VectorRAG)Mistral-7B-Instruct for generationCohere reranker for passage re-ranking
Frameworks
Neo4jFAISSDockerGitHub pipeline
Is Agentic

Yes

Architectures
LLM-based planner (LLaMA-3.3-70B-versatile)Tool-calling workflow (GraphRAG and VectorRAG functions)
Collaboration
Supports human-in-the-loop review; encourages oversight for low-confidence outputs

Optimization Features

Token Efficiency
Chunking text into 2024-character segments with 50-character overlap (reduces redundant context)
Infra Optimization

GPU acceleration reduces latency from ~2 minutes on consumer hardware to ~10 seconds on server GPUs

System Optimization
Dockerizable pipeline for reproducible deployment
Training Optimization
Instruction tuning of Mistral-7B-InstructDirect Preference Optimization (DPO) with 15 preference pairs
Inference Optimization
Agentic routing to pick the most suitable retriever per queryEnsemble retrieval and reranking to prioritize high-quality passages

Reproducibility

Code AvailableYes
Data AvailableNo
Open Source StatusYes
LicenseUnknown

Risks & Boundaries

Limitations

Evaluation relies on a synthetic benchmark of 40 QA pairs; may not reflect complex real-world literature queries.

NL→Cypher translation uses few-shot prompting and can mis-translate complex queries.

When Not To Use

Do not rely on this system where absolute formal guarantees are required (legal/clinical decisions) without human verification.

Avoid for corpora composed mostly of scanned PDFs until OCR is integrated.

Failure Modes

Mistaken Cypher generation leads to wrong or missing KG answers.

If retrieval fails (both KG and VS), the generator can hallucinate despite DPO.

Core Entities

Models

LLaMA-3.3-70B-versatileMistral-7B-Instruct-v0.3all-MiniLM-L6-v2Cohere rerank-english-v3.0

Metrics

FaithfulnessAnswer relevanceContext precisionContext recall

Datasets

PubMed (via API)ArXiv (via API)Google Scholar (via API)Synthetic RAG benchmark (40 QA pairs)

Benchmarks

Custom synthetic VectorRAG/GraphRAG benchmark (20 VS Q, 20 KG Q)