Agentic AI pipelines that generate test scenarios and search software project documents

Overview

Decision SnapshotNeeds Validation

Authors provide working deployments and engineering details but no formal evaluations or public code, so practical readiness is moderate while empirical evidence is limited.

Citations0

Evidence Strength0.40

Confidence0.80

Risk Signals11

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 0/0

Reproducibility

Status: No open assets linked

Open source: Unknown

At A Glance

Cost impact: 60%

Production readiness: 60%

Novelty: 45%

Authors

Marian Kica, Lukas Radosky, David Slivka, Karin Kubinova, Daniel Dovhun, Tomas Uhercik, Erik Bircak, Ivan Polasek

Links

Abstract / PDF

Why It Matters For Business

Agentic pipelines can automate repetitive SE tasks (test-scenario creation and document search), cut manual labor, and speed onboarding; the systems are deployed internally but lack formal benchmarks.

Who Should Care

Product Manager CTO Engineering Lead ML Engineer Founder

Summary TLDR

The authors present two working agent-based LLM systems for software engineering tasks: (1) a test-scenario generator using a 6-agent star topology (supervisor + specialized workers) that preprocesses FSDs, writes scenarios, fact-checks, translates, and exports Excel; (2) a document-processing pipeline with a Delegator agent and four dedicated LLM agents (Search, Q&A, Trace, Reading) backed by a Qdrant document DB. Both systems use LangChain/LangGraph, handle images with a vision model, and are deployed daily in a medium-sized SE company. No formal benchmark or quantitative evaluation is reported.

Problem Statement

Writing test scenarios from long natural-language requirements is slow and costly. Finding and tracking information across many evolving SDLC documents is hard for newcomers and teams. The paper aims to automate both tasks using agentic LLM pipelines to reduce manual effort and speed information discovery.

Main Contribution

A practical agentic architecture for automatic test scenario generation: 6 agents in a star topology with a supervisor coordinating specialized workers.

A document-processing agent pipeline for SDLC documents: a Delegator plus four LLM agents (Search, Q&A, Trace, Reading) using a shared Qdrant database.

Key Findings

Test scenario generator implemented as a 6-agent star with a supervisor and specialized workers.

Numbers6 agents; star topology described in Sec. 3.1

Practical UseYou can prototype automated test creation by wiring a supervisor agent to small, task-specific worker agents and keep artifacts outside agent context.

Evidence RefSec. 3.1 (architecture and Fig.1)

Document-processing system supports four explicit use cases via dedicated agents and a shared vector DB.

Numbers4 use cases (Search, Q&A, Trace, Reading); Qdrant DB used

Practical UseSet up a per-project Qdrant index and assign specialized LLM agents per use case to keep responses focused and auditable.

Evidence RefSec. 3.2 (architecture and Fig.6)

What To Try In 7 Days

Index one project’s documents in Qdrant and run a simple Search agent to surface key specs.

Prototype the 6-agent star for a single FSD chapter: retriever → writer → fact-checker → translator → Excel export.

Add a fact-checker step to any LLM output and log mismatches to measure hallucination rates.

Agent Features

Memory

per-agent context and historyexternal artifact storage to keep supervisor context small

Planning

ordered worker invocation enforced by supervisorworker input validation and feedback loops

Tool Use

VLM for image processingQdrant vector DBExcel writer (non-LLM)

Frameworks

LangChainLangGraph

Is Agentic

Yes

Architectures

star topology (supervisor + workers)delegator-based multi-agent pipeline

Collaboration

supervisor/Delegator mediates all communicationworkers unaware of each other

Reproducibility

Code AvailableNo

Data AvailableNo

Open Source StatusUnknown

LicenseUnknown

Risks & Boundaries

Limitations

No formal quantitative evaluation or benchmark results are reported.

Excel writer is hard-coded and not LLM-driven, limiting flexibility.

When Not To Use

For tasks needing provable correctness or regulatory guarantees.

When you cannot supply any project-specific documents for indexing.

Failure Modes

LLM hallucinations leading to incorrect scenarios despite fact-checker.

Supervisor misordering or incorrect prompts causing worker errors.

Core Entities

Models

on-premise and cloud LLMs (unnamed)

Context Entities

Models

GPT-3.5 (related work)GPT-4 (related work)LLaMA, Mistral (related literature)

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

Test scenario generator implemented as a 6-agent star with a supervisor and specialized workers.

Document-processing system supports four explicit use cases via dedicated agents and a shared vector DB.

What To Try In 7 Days

Agent Features

Reproducibility

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Context Entities

Models

You May Also Want to Read

Survey of safe interfaces, threat models, and standards for LLM-driven agents that act on blockchains

Key finding

Diffusion-backed agents match accuracy but run ~30% faster and can reach up to 8× speedups in some cases

Key finding

TOOLMAKER: agents that turn scientific GitHub repos into executable LLM tools

Key finding

TrustBench: a runtime safety gate for agents that cuts harmful actions and runs in under 200 ms

Key finding

ERI: 57,750 engineering instruction-response items across 9 fields to test LLM reasoning and agent tool-use

Key finding