Use multi-agent RAG plus a hybrid vector-graph memory to auto-generate traceable test plans and cases, cutting test-document work by ~85% in

Overview

Decision SnapshotNeeds Validation

System is mature in a specific enterprise setting (SAP). Strong practical gains reported, but evidence is from internal deployments and lacks public code/data and external replication.

Citations0

Evidence Strength0.65

Confidence0.75

Risk Signals11

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 5/6

Reproducibility

Status: No open assets linked

Open source: No

At A Glance

Cost impact: 80%

Production readiness: 70%

Novelty: 60%

Authors

Mohanakrishnan Hariharan, Satish Arvapalli, Seshu Barma, Evangeline Sheela

Links

Abstract / PDF

Why It Matters For Business

Automates time-consuming test-document work, preserves traceability for regulated enterprise projects, and can shrink timelines and costs if you can manage integration and KB upkeep.

Who Should Care

CTO Product Manager Engineering Lead ML Engineer Founder

Summary TLDR

This paper builds an enterprise software-testing system that combines retrieval-augmented generation (RAG) with a relationship-aware graph, plus multiple specialized LLM agents. In an SAP migration case study the authors report accuracy rising from ~65% (basic RAG) to 94.8% (Agentic RAG), an 85% reduction in artifact creation time, 35% cost savings, and improved traceability. The system uses a dual-store (vector + TigerGraph), multi-layer prompts, and dynamic model routing (Mistral 7B for cheap tasks, Gemini Pro for hard cases). Results are promising but come from internal deployments and proprietary integrations, so expect work to adapt and maintain the hybrid KB and integration plumbing.

Problem Statement

Quality engineers spend ~30–40% of time writing test artifacts. Traditional RAG loses business relationships during retrieval. Manual methods don’t scale for enterprise systems (e.g., SAP) and lack traceability across requirements, tests, and results.

Main Contribution

Hybrid vector-graph knowledge system that combines semantic search (vectors) with relationship-aware graph traversal to preserve business context.

Multi-agent orchestration layer with specialized agents for legacy analysis, mapping changes, integration points, test case creation, and compliance checks.

Key Findings

Agentic multi-agent RAG improves test artifact accuracy compared to Basic RAG.

NumbersBasic RAG 65.2% -> Agentic RAG 94.8%

Practical UseAdopt the hybrid+agentic stack incrementally (vector -> hybrid -> agentic) to raise automated test-plan/case accuracy from mid-60s to mid-90s on evaluated datasets.

Evidence RefSec II.G; Sec IV.B; Table I

Artifact creation time dropped dramatically in the reported deployments.

NumbersTime reduced 85% (240h -> 36h per project phase)

Practical UseExpect large labor savings on test-documentation work, but plan for integration effort to connect enterprise systems and maintain knowledge bases.

Evidence RefSec IV.C

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
Accuracy	94.8%	Basic RAG 65.2%	+29.6pp	Aggregated evaluation (synthetic + enterprise)	Sec II.G; Sec IV.B; Table I	Table I
Accuracy	94.8%	Basic RAG 65%	+29.8pp	Reported test plan task	Sec IV.B	Sec IV.B

What To Try In 7 Days

Run a small pilot: index 100–500 legacy test items into a vector DB and TigerGraph and measure retrieval relevance.

Implement a single specialized agent (e.g., Modernized Test Case Agent) to generate and validate 50 test cases from real requirements.

Set up a basic traceability matrix for one module and compare coverage before/after automated generation.

Agent Features

Memory

Hybrid retrieval memory (vector store for semantics, graph for relationships)Bidirectional traceability as persistent links

Planning

Task decomposition into specialized agentsStrategy synthesis from objectives and history

Tool Use

TigerGraph for relationship traversalVector DB (Single Store) for semantic searchSentence Transformer for embeddingsKubernetes/Docker for orchestration

Frameworks

Multi-layer prompt engineering (context, spec, template, validation, enhancement)Seven-layer context validation pipeline

Is Agentic

Yes

Architectures

Multi-agent orchestration layerDual-database (vector + graph) hybrid architectureDynamic model routing across LLMs

Collaboration

Agent-to-agent orchestration and handoffConflict resolution with rule-based priority system

Optimization Features

Infra Optimization

Containerized deployment (Docker + Kubernetes)Distributed vector store and TigerGraph Cloud

System Optimization

Microservices architecture with health monitoring and auto-failoverHorizontal scaling for vector and graph stores

Inference Optimization

Dynamic model routing (use smaller LLMs for simple tasks, larger for complex reasoning)

Reproducibility

Code AvailableNo

Data AvailableNo

Open Source StatusNo

LicenseUnknown

Risks & Boundaries

Limitations

Domain specialization to SAP, employee, and finance systems; may need extra work to adapt to other domains.

Hybrid KB requires ongoing maintenance as business processes change.

When Not To Use

Small projects where integration overhead outweighs automation benefits.

Environments that cannot support a dual-database architecture or strict data redaction policies.

Failure Modes

Context fragmentation if graph relationships are incomplete or poorly modeled.

Quality drop if enhanced contextualization or conflict-resolution rules are not tuned.

Core Entities

Models

Gemini ProMistral 7BGPT-4Sentence Transformer

Metrics

AccuracyTraceability coverage (%)Time reduction (%)Cost savings (%)Functional coverage (%)Defect detection change (%)

Datasets

Synthetic Test Dataset (5,000 scenarios)Enterprise SAP S/4HANA dataset (1,000 cases requiring transformation)25,000 generated test cases (deployment output)

Context Entities

Models

Fusion-in-Decoder (related work)Dense Passage Retrieval (related work)

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

Agentic multi-agent RAG improves test artifact accuracy compared to Basic RAG.

Artifact creation time dropped dramatically in the reported deployments.

Results

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Context Entities

Models

You May Also Want to Read

Chemistry foundation models power structure-focused multimodal RAG inside hierarchical multi-agent workflows

Key finding

Create, customize, and run multi-step LLM agents from plain language — no code needed

Key finding

COMPASS: a multi-agent orchestration that uses RAG and an LLM-as-judge to enforce sovereignty, carbon-awareness, compliance, and ethics in实时

Key finding

AgentAuditor: memory‑augmented RAG + CoT that makes LLM evaluators reach human-level accuracy on agent safety

Key finding

Multi-agent, retrieval-first system that cuts legal LLM hallucinations by iterating search, judge, and summary

Key finding