DEPSRAG: an agentic RAG system that builds dependency knowledge graphs and uses a critic loop to improve dependency reasoning

Overview

Decision SnapshotNeeds Validation

The system is a clear engineering prototype: code and dataset access exist, results are small-scale but measured; expect integration work and stronger LLMs for production.

Citations1

Evidence Strength0.60

Confidence0.80

Risk Signals8

Trust Signals

Findings with numeric evidence: 2/2

Findings with evidence refs: 2/2

Results with explicit delta: 1/2

Reproducibility

Status: Code + data available

Open source: Yes

At A Glance

Cost impact: 50%

Production readiness: 40%

Novelty: 50%

Authors

Mohannad Alhanahnah, Yazan Boshmaf

Links

Abstract / PDF / Code / Data

Why It Matters For Business

DepsRAG automates dependency analysis and vulnerability lookup, cutting manual checks that delay library approvals and enabling faster, evidence-backed decisions.

Who Should Care

Engineering Lead CTO Product Manager ML Engineer

Summary TLDR

DepsRAG is a multi-agent assistant that builds a dependency knowledge graph (KG) for a given package, augments queries with retrieval (KG + web + vulnerability DB), and uses an Agent–Critic loop to iteratively refine answers. In a proof-of-concept using GPT-4-Turbo and Llama-3, adding the Critic-Agent raised answer precision from 13.3% to 40% (threefold). The system is implemented in Python with Langroid, Neo4j, and the Deps.Dev API; code and demo are published.

Problem Statement

Developers need faster, more reliable tools to reason about direct and transitive software dependencies, security risks, and maintainability before importing third-party packages. Existing tools are fragmented (security, visualization, manual checks) and miss issues like circular dependencies, transitive risks, and up-to-date vulnerability context, creating approval bottlenecks.

Main Contribution

Design of DEPSRAG: a multi-agent, retrieval-augmented framework for reasoning about software dependencies.

A dependency Knowledge Graph builder that captures direct and transitive package relations via Deps.Dev and Neo4j.

Key Findings

Adding a Critic-Agent raised answer precision from 13.3% to 40% on evaluated tasks.

Numbers13.3% → 40% precision (ten iterations, three tasks)

Practical UseAdd a critic-style feedback step to multi-agent dependency QA to materially improve correctness; expect roughly threefold gains on similar multi-step tasks.

Evidence RefSection 5.2.2, Figure 3

GPT-4-Turbo generated correct Cypher queries on first attempt for all test questions; Llama-3 needed retries and produced an incorrect final answer for one question.

NumbersGPT-4-Turbo: 0 retries; Llama-3: up to 2 retries, 1 incorrect final answer

Practical UsePrefer stronger LLMs or add schema-retrieval + retry logic when mapping natural language to DB queries to avoid wrong graph answers.

Evidence RefSection 5.2.1, Table 1, Listing 1

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
Answer precision (with vs without Critic-Agent)	40% (with Critic)	13.3% (without Critic)	≈3×	Three multi-step tasks, ten iterations (GPT-4-Turbo)	Section 5.2.2, Figure 3	—
Cypher query generation trials	GPT-4-Turbo: 0 retries; Llama-3: up to 2 retries	—	—	Questions on Chainlit v1.1.200 dependency KG	Section 5.2.1, Table 1, Listing 1	—

What To Try In 7 Days

Run DEPSRAG on a critical package to generate a dependency KG and identify top-risk transitive dependencies.

Integrate Critic-Agent style validation into existing LLM QA flows to reduce wrong answers.

Add schema retrieval + retry logic when converting natural language to DB queries.

Agent Features

Memory

retrieval memory (KG + web + vulnerability DB)

Planning

task decompositionsubtask orchestration

Tool Use

graph DB queries (Neo4j)web searchvulnerability DB lookups

Frameworks

Langroid

Is Agentic

Yes

Architectures

multi-agentagent-critic

Collaboration

agent orchestrationinter-agent routing

Optimization Features

Token Efficiency

LLM Minimization Principle (avoid LLM where deterministic code suffices)

System Optimization

limit critic iterations to 10 to prevent infinite loops

Reproducibility

Code AvailableYes

Data AvailableYes

Open Source StatusYes

LicenseUnknown

Code URLs

https://github.com/Mohannadcse/DepsRAG

Data URLs

https://deps.dev

Risks & Boundaries

Limitations

LLM fragility: incorrect DB-query translation can yield wrong graph answers.

Critic-Agent only validates final answers in this work, increasing token cost and runtime.

When Not To Use

For trivial dependency checks where existing tools suffice and LLM cost is unjustified.

Where strict low-latency or low-cost constraints prohibit multiple agent exchanges.

Failure Modes

Unproductive critic–agent loops leading to termination after iteration cap.

Hallucinated or overly general Cypher queries that return irrelevant results.

Core Entities

Models

GPT-4-TurboLlama-3

Metrics

answer precisionCypher query trials

Datasets

Deps.Dev API

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

Adding a Critic-Agent raised answer precision from 13.3% to 40% on evaluated tasks.

GPT-4-Turbo generated correct Cypher queries on first attempt for all test questions; Llama-3 needed retries and produced an incorrect final answer for one question.

Results

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Code URLs

Data URLs

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

You May Also Want to Read

Chemistry foundation models power structure-focused multimodal RAG inside hierarchical multi-agent workflows

Key finding

Argues that 'agentic' buzzwords mostly rebrand decades-old agent and multi-agent research

Key finding

TRiSM: practical trust, risk and security controls for LLM-based multi-agent systems

Key finding

A dynamic town simulation that tests LLM agents on doing tasks while following local cultural norms

Key finding

A process-aware, auditable multi-agent evaluator that produces more stable, human-aligned scores than a single LLM judge

Key finding