Overview
The method is well-specified and evaluated on multiple datasets and two GraphRAG systems, but relies on eliciting structured responses and specific LLM/backbone behavior; defenses and deployment differences will affect real-world impact.
Citations0
Evidence Strength0.80
Confidence0.80
Risk Signals9
Trust Signals
Findings with numeric evidence: 5/5
Findings with evidence refs: 5/5
Results with explicit delta: 4/4
Reproducibility
Status: Partial assets available
Open source: Partial
At A Glance
Cost impact: 60%
Production readiness: 40%
Novelty: 60%
Why It Matters For Business
Graph-structured retrieval can leak reusable entity-relation graphs with surprisingly few queries; operators should treat structured retrieval as a privacy risk and add monitoring, response filtering, or query limits.
Who Should Care
Summary TLDR
This paper shows an attacker can reconstruct large parts of a GraphRAG system's hidden knowledge graph using a small number of queries. The authors introduce AGEA: an agentic loop that alternates novelty-driven exploration and targeted exploitation, keeps a graph memory, and filters LLM-extracted entities/edges before committing them. On two GraphRAG systems and several domains, AGEA recovers much more graph structure per query than prior attacks (e.g., up to ≈90% node/edge recovery on medium graphs with 1,000 queries), while keeping high precision. The attack relies on eliciting structured outputs and LLM-based filtering, and it weakens as graphs grow much larger or when victims restrict or
Problem Statement
Can a black-box attacker, limited to a fixed number of queries, reconstruct the internal entity–relation graph used by GraphRAG systems? The difficulty is noisy, mixed-format responses, no direct graph access, and a strict query budget that forces a trade-off between exploring new areas and exploiting known hubs.
Main Contribution
Formalize budgeted, black-box graph-level extraction attacks against GraphRAG systems.
Propose AGEA: an agentic, novelty-guided explore/exploit attacker with graph memory and a two-stage (regex discovery + LLM filtering) extraction pipeline.
Key Findings
AGEA recovers a very large fraction of nodes and edges under 1,000 queries on medium graphs.
On LightRAG AGEA achieves even higher coverage and precision.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Leak(N) | 87.09% | AGEA vs baselines (M-GraphRAG, Medical) | best | M-GraphRAG Medical (T=1000) | Table 1 reports AGEA node leakage 87.09% at 1000 queries | Table 1 |
| Leak(E) | 80.16% | AGEA vs baselines (M-GraphRAG, Medical) | best | M-GraphRAG Medical (T=1000) | Table 1 reports AGEA edge leakage 80.16% at 1000 queries | Table 1 |
What To Try In 7 Days
Run a red-team extraction using a novelty-driven agent to measure your system's structured leakage under realistic query budgets.
Enable structured-output controls: block or sanitize machine-readable entity/relation lists in LLM responses.
Add retrieval-time checks or rate limits on repeated hub-focused queries and log novelty-like metrics to detect agentic probing.
Agent Features
Memory
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Optimization Features
Token Efficiency
Reproducibility
Risks & Boundaries
Limitations
Relies on the victim producing machine-structured outputs; output-restriction policies can blunt the attack.
Does not model active deployment defenses like query rewriting, monitoring, or rate-limiting.
When Not To Use
If the target system enforces strict output formatting or forbids structured extraction commands.
When deployment includes effective traversal-aware monitoring or strict rate limits.
Failure Modes
Hallucinated hubs: LLM filter may miss widespread spurious connections if prompts are too lenient.
Backbone sensitivity: different LLMs produce large precision differences for relations.

