DEPSRAG: an agentic RAG system that builds dependency knowledge graphs and uses a critic loop to improve dependency reasoning

May 30, 20246 min

Overview

Production Readiness

0.4

Novelty Score

0.5

Cost Impact Score

0.5

Citation Count

1

Authors

Mohannad Alhanahnah, Yazan Boshmaf

Links

Abstract / PDF

Why It Matters For Business

DepsRAG automates dependency analysis and vulnerability lookup, cutting manual checks that delay library approvals and enabling faster, evidence-backed decisions.

Summary TLDR

DepsRAG is a multi-agent assistant that builds a dependency knowledge graph (KG) for a given package, augments queries with retrieval (KG + web + vulnerability DB), and uses an Agent–Critic loop to iteratively refine answers. In a proof-of-concept using GPT-4-Turbo and Llama-3, adding the Critic-Agent raised answer precision from 13.3% to 40% (threefold). The system is implemented in Python with Langroid, Neo4j, and the Deps.Dev API; code and demo are published.

Problem Statement

Developers need faster, more reliable tools to reason about direct and transitive software dependencies, security risks, and maintainability before importing third-party packages. Existing tools are fragmented (security, visualization, manual checks) and miss issues like circular dependencies, transitive risks, and up-to-date vulnerability context, creating approval bottlenecks.

Main Contribution

Design of DEPSRAG: a multi-agent, retrieval-augmented framework for reasoning about software dependencies.

A dependency Knowledge Graph builder that captures direct and transitive package relations via Deps.Dev and Neo4j.

An Agent–Critic interaction pattern that iteratively validates and refines LLM answers to improve correctness.

Proof-of-concept implementation (Langroid + Neo4j + Deps.Dev) and evaluation with GPT-4-Turbo and Llama-3; code published.

Key Findings

Adding a Critic-Agent raised answer precision from 13.3% to 40% on evaluated tasks.

Numbers13.3% → 40% precision (ten iterations, three tasks)

GPT-4-Turbo generated correct Cypher queries on first attempt for all test questions; Llama-3 needed retries and produced an incorrect final answer for one question.

NumbersGPT-4-Turbo: 0 retries; Llama-3: up to 2 retries, 1 incorrect final answer

Results

Answer precision (with vs without Critic-Agent)

Value40% (with Critic)

Baseline13.3% (without Critic)

Cypher query generation trials

ValueGPT-4-Turbo: 0 retries; Llama-3: up to 2 retries

Who Should Care

What To Try In 7 Days

Run DEPSRAG on a critical package to generate a dependency KG and identify top-risk transitive dependencies.

Integrate Critic-Agent style validation into existing LLM QA flows to reduce wrong answers.

Add schema retrieval + retry logic when converting natural language to DB queries.

Agent Features

Memory

  • retrieval memory (KG + web + vulnerability DB)

Planning

  • task decomposition
  • subtask orchestration

Tool Use

  • graph DB queries (Neo4j)
  • web search
  • vulnerability DB lookups

Frameworks

  • Langroid

Is Agentic

true

Architectures

  • multi-agent
  • agent-critic

Collaboration

  • agent orchestration
  • inter-agent routing

Optimization Features

Token Efficiency

  • LLM Minimization Principle (avoid LLM where deterministic code suffices)

System Optimization

  • limit critic iterations to 10 to prevent infinite loops

Reproducibility

Code Available

Data Available

Open Source Status

  • yes

Risks & Boundaries

Limitations

  • LLM fragility: incorrect DB-query translation can yield wrong graph answers.
  • Critic-Agent only validates final answers in this work, increasing token cost and runtime.
  • Proof-of-concept evaluation is small-scale and focused on Chainlit and a few tasks.

When Not To Use

  • For trivial dependency checks where existing tools suffice and LLM cost is unjustified.
  • Where strict low-latency or low-cost constraints prohibit multiple agent exchanges.

Failure Modes

  • Unproductive critic–agent loops leading to termination after iteration cap.
  • Hallucinated or overly general Cypher queries that return irrelevant results.
  • Open-source LLMs (e.g., Llama-3) may fail to follow orchestration and produce incorrect outputs.

Core Entities

Models

  • GPT-4-Turbo
  • Llama-3

Metrics

  • answer precision
  • Cypher query trials

Datasets

  • Deps.Dev API