Agentic AI pipelines that generate test scenarios and search software project documents

February 4, 20267 min

Overview

Decision SnapshotNeeds Validation

Authors provide working deployments and engineering details but no formal evaluations or public code, so practical readiness is moderate while empirical evidence is limited.

Citations0

Evidence Strength0.40

Confidence0.80

Risk Signals11

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 0/0

Reproducibility

Status: No open assets linked

Open source: Unknown

At A Glance

Cost impact: 60%

Production readiness: 60%

Novelty: 45%

Authors

Marian Kica, Lukas Radosky, David Slivka, Karin Kubinova, Daniel Dovhun, Tomas Uhercik, Erik Bircak, Ivan Polasek

Links

Abstract / PDF

Why It Matters For Business

Agentic pipelines can automate repetitive SE tasks (test-scenario creation and document search), cut manual labor, and speed onboarding; the systems are deployed internally but lack formal benchmarks.

Who Should Care

Summary TLDR

The authors present two working agent-based LLM systems for software engineering tasks: (1) a test-scenario generator using a 6-agent star topology (supervisor + specialized workers) that preprocesses FSDs, writes scenarios, fact-checks, translates, and exports Excel; (2) a document-processing pipeline with a Delegator agent and four dedicated LLM agents (Search, Q&A, Trace, Reading) backed by a Qdrant document DB. Both systems use LangChain/LangGraph, handle images with a vision model, and are deployed daily in a medium-sized SE company. No formal benchmark or quantitative evaluation is reported.

Problem Statement

Writing test scenarios from long natural-language requirements is slow and costly. Finding and tracking information across many evolving SDLC documents is hard for newcomers and teams. The paper aims to automate both tasks using agentic LLM pipelines to reduce manual effort and speed information discovery.

Main Contribution

A practical agentic architecture for automatic test scenario generation: 6 agents in a star topology with a supervisor coordinating specialized workers.

A document-processing agent pipeline for SDLC documents: a Delegator plus four LLM agents (Search, Q&A, Trace, Reading) using a shared Qdrant database.

Key Findings

Test scenario generator implemented as a 6-agent star with a supervisor and specialized workers.

Numbers6 agents; star topology described in Sec. 3.1

Practical UseYou can prototype automated test creation by wiring a supervisor agent to small, task-specific worker agents and keep artifacts outside agent context.

Evidence RefSec. 3.1 (architecture and Fig.1)

Document-processing system supports four explicit use cases via dedicated agents and a shared vector DB.

Numbers4 use cases (Search, Q&A, Trace, Reading); Qdrant DB used

Practical UseSet up a per-project Qdrant index and assign specialized LLM agents per use case to keep responses focused and auditable.

Evidence RefSec. 3.2 (architecture and Fig.6)

What To Try In 7 Days

Index one project’s documents in Qdrant and run a simple Search agent to surface key specs.

Prototype the 6-agent star for a single FSD chapter: retriever → writer → fact-checker → translator → Excel export.

Add a fact-checker step to any LLM output and log mismatches to measure hallucination rates.

Agent Features

Memory
per-agent context and historyexternal artifact storage to keep supervisor context small
Planning
ordered worker invocation enforced by supervisorworker input validation and feedback loops
Tool Use
VLM for image processingQdrant vector DBExcel writer (non-LLM)
Frameworks
LangChainLangGraph
Is Agentic

Yes

Architectures
star topology (supervisor + workers)delegator-based multi-agent pipeline
Collaboration
supervisor/Delegator mediates all communicationworkers unaware of each other

Reproducibility

Code AvailableNo
Data AvailableNo
Open Source StatusUnknown
LicenseUnknown

Risks & Boundaries

Limitations

No formal quantitative evaluation or benchmark results are reported.

Excel writer is hard-coded and not LLM-driven, limiting flexibility.

When Not To Use

For tasks needing provable correctness or regulatory guarantees.

When you cannot supply any project-specific documents for indexing.

Failure Modes

LLM hallucinations leading to incorrect scenarios despite fact-checker.

Supervisor misordering or incorrect prompts causing worker errors.

Core Entities

Models

on-premise and cloud LLMs (unnamed)

Context Entities

Models

GPT-3.5 (related work)GPT-4 (related work)LLaMA, Mistral (related literature)