Overview
Open-source pipeline and synthetic benchmark show promising gains. Evidence is limited to a small, synthetic test and bootstrapped estimates; expect additional engineering (Cypher translation, OCR, domain tuning) before deployment in high-stakes settings.
Citations0
Evidence Strength0.60
Confidence0.80
Risk Signals11
Trust Signals
Findings with numeric evidence: 5/5
Findings with evidence refs: 5/5
Results with explicit delta: 11/11
Reproducibility
Status: Partial assets available
Open source: Yes
At A Glance
Cost impact: 50%
Production readiness: 70%
Novelty: 60%
Why It Matters For Business
Automating literature review with a system that picks the right retrieval mode reduces manual search time and improves the relevance of extracted evidence. This matters for teams that need fast, evidence-grounded summaries across many papers (R&D, clinical review, IP) and want an auditable pipeline.
Who Should Care
Summary TLDR
The authors built and open-sourced an agentic Retrieval-Augmented Generation (RAG) system that stores literature in both a Neo4j knowledge graph and a FAISS vector store, and dynamically picks GraphRAG or VectorRAG per query. Instruction tuning plus Direct Preference Optimization (DPO) improves grounding and retrieval. On a synthetic scientific benchmark the agentic+DPO setup raised vector-store context recall by +0.63 and overall context precision by +0.56 versus a non-agentic baseline. Code is on GitHub.
Problem Statement
Static RAG pipelines (one fixed retrieval path) miss many scientific information needs. Researchers need a system that combines structured metadata (citations, authors) and full-text semantics, picks the right retrieval strategy per question, and reports uncertainty.
Main Contribution
An open-source Python pipeline that ingests PubMed/ArXiv/Google Scholar, builds a Neo4j knowledge graph and a FAISS vector store of full-text chunks.
An agentic orchestration layer (LLaMA-3.3-70B-versatile) that dynamically selects between GraphRAG (Cypher queries) and VectorRAG (BM25 + dense search + reranker) per prompt.
Key Findings
Agentic system with DPO substantially increases vector-store context recall.
Overall context precision improves meaningfully under agentic+DPO control.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| VS Context Recall | +0.63 | Non-agentic RAG | +0.63 | Synthetic benchmark (VectorRAG items) | Results section; Abstract | Results; Figure 5 |
| Overall Context Precision | +0.56 | Non-agentic RAG | +0.56 | Synthetic benchmark (combined) | Results section; Abstract | Results |
What To Try In 7 Days
Clone the repo and run the pipeline on a small topic using PubMed/ArXiv API keys to see ingest and KG/VS construction.
Compare answers for a handful of domain questions between a static vector-search pipeline and the agentic pipeline to observe differences in retrieved evidence.
Add 10–20 human preference pairs (DPO style) for your domain to quickly test gains in faithfulness.
Agent Features
Memory
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Collaboration
Optimization Features
Token Efficiency
Infra Optimization
GPU acceleration reduces latency from ~2 minutes on consumer hardware to ~10 seconds on server GPUs
System Optimization
Training Optimization
Inference Optimization
Reproducibility
Risks & Boundaries
Limitations
Evaluation relies on a synthetic benchmark of 40 QA pairs; may not reflect complex real-world literature queries.
NL→Cypher translation uses few-shot prompting and can mis-translate complex queries.
When Not To Use
Do not rely on this system where absolute formal guarantees are required (legal/clinical decisions) without human verification.
Avoid for corpora composed mostly of scanned PDFs until OCR is integrated.
Failure Modes
Mistaken Cypher generation leads to wrong or missing KG answers.
If retrieval fails (both KG and VS), the generator can hallucinate despite DPO.

