Overview
System is mature in a specific enterprise setting (SAP). Strong practical gains reported, but evidence is from internal deployments and lacks public code/data and external replication.
Citations0
Evidence Strength0.65
Confidence0.75
Risk Signals11
Trust Signals
Findings with numeric evidence: 5/5
Findings with evidence refs: 5/5
Results with explicit delta: 5/6
Reproducibility
Status: No open assets linked
Open source: No
At A Glance
Cost impact: 80%
Production readiness: 70%
Novelty: 60%
Why It Matters For Business
Automates time-consuming test-document work, preserves traceability for regulated enterprise projects, and can shrink timelines and costs if you can manage integration and KB upkeep.
Who Should Care
Summary TLDR
This paper builds an enterprise software-testing system that combines retrieval-augmented generation (RAG) with a relationship-aware graph, plus multiple specialized LLM agents. In an SAP migration case study the authors report accuracy rising from ~65% (basic RAG) to 94.8% (Agentic RAG), an 85% reduction in artifact creation time, 35% cost savings, and improved traceability. The system uses a dual-store (vector + TigerGraph), multi-layer prompts, and dynamic model routing (Mistral 7B for cheap tasks, Gemini Pro for hard cases). Results are promising but come from internal deployments and proprietary integrations, so expect work to adapt and maintain the hybrid KB and integration plumbing.
Problem Statement
Quality engineers spend ~30–40% of time writing test artifacts. Traditional RAG loses business relationships during retrieval. Manual methods don’t scale for enterprise systems (e.g., SAP) and lack traceability across requirements, tests, and results.
Main Contribution
Hybrid vector-graph knowledge system that combines semantic search (vectors) with relationship-aware graph traversal to preserve business context.
Multi-agent orchestration layer with specialized agents for legacy analysis, mapping changes, integration points, test case creation, and compliance checks.
Key Findings
Agentic multi-agent RAG improves test artifact accuracy compared to Basic RAG.
Artifact creation time dropped dramatically in the reported deployments.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Accuracy | 94.8% | Basic RAG 65.2% | +29.6pp | Aggregated evaluation (synthetic + enterprise) | Sec II.G; Sec IV.B; Table I | Table I |
| Accuracy | 94.8% | Basic RAG 65% | +29.8pp | Reported test plan task | Sec IV.B | Sec IV.B |
What To Try In 7 Days
Run a small pilot: index 100–500 legacy test items into a vector DB and TigerGraph and measure retrieval relevance.
Implement a single specialized agent (e.g., Modernized Test Case Agent) to generate and validate 50 test cases from real requirements.
Set up a basic traceability matrix for one module and compare coverage before/after automated generation.
Agent Features
Memory
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Collaboration
Optimization Features
Infra Optimization
System Optimization
Inference Optimization
Reproducibility
Risks & Boundaries
Limitations
Domain specialization to SAP, employee, and finance systems; may need extra work to adapt to other domains.
Hybrid KB requires ongoing maintenance as business processes change.
When Not To Use
Small projects where integration overhead outweighs automation benefits.
Environments that cannot support a dual-database architecture or strict data redaction policies.
Failure Modes
Context fragmentation if graph relationships are incomplete or poorly modeled.
Quality drop if enhanced contextualization or conflict-resolution rules are not tuned.

