Overview
Production Readiness
0.6
Novelty Score
0.45
Cost Impact Score
0.6
Citation Count
0
Why It Matters For Business
Agentic pipelines can automate repetitive SE tasks (test-scenario creation and document search), cut manual labor, and speed onboarding; the systems are deployed internally but lack formal benchmarks.
Summary TLDR
The authors present two working agent-based LLM systems for software engineering tasks: (1) a test-scenario generator using a 6-agent star topology (supervisor + specialized workers) that preprocesses FSDs, writes scenarios, fact-checks, translates, and exports Excel; (2) a document-processing pipeline with a Delegator agent and four dedicated LLM agents (Search, Q&A, Trace, Reading) backed by a Qdrant document DB. Both systems use LangChain/LangGraph, handle images with a vision model, and are deployed daily in a medium-sized SE company. No formal benchmark or quantitative evaluation is reported.
Problem Statement
Writing test scenarios from long natural-language requirements is slow and costly. Finding and tracking information across many evolving SDLC documents is hard for newcomers and teams. The paper aims to automate both tasks using agentic LLM pipelines to reduce manual effort and speed information discovery.
Main Contribution
A practical agentic architecture for automatic test scenario generation: 6 agents in a star topology with a supervisor coordinating specialized workers.
A document-processing agent pipeline for SDLC documents: a Delegator plus four LLM agents (Search, Q&A, Trace, Reading) using a shared Qdrant database.
Engineering-level design choices: per-agent context/history, external artifact storage, fact-checker to reduce hallucinations, VLM for images, Excel writer for export.
A live deployment in a medium-sized software company and a plan to collect usage data and run formal benchmark evaluations in future work.
Key Findings
Test scenario generator implemented as a 6-agent star with a supervisor and specialized workers.
Document-processing system supports four explicit use cases via dedicated agents and a shared vector DB.
The pipeline includes a fact-checker agent and translation/export steps to reduce hallucination and match client needs.
Both systems are deployed and used daily in a medium-sized SE company, but no formal evaluation is provided.
Who Should Care
What To Try In 7 Days
Index one project’s documents in Qdrant and run a simple Search agent to surface key specs.
Prototype the 6-agent star for a single FSD chapter: retriever → writer → fact-checker → translator → Excel export.
Add a fact-checker step to any LLM output and log mismatches to measure hallucination rates.
Agent Features
Memory
- per-agent context and history
- external artifact storage to keep supervisor context small
Planning
- ordered worker invocation enforced by supervisor
- worker input validation and feedback loops
Tool Use
- VLM for image processing
- Qdrant vector DB
- Excel writer (non-LLM)
Frameworks
- LangChain
- LangGraph
Is Agentic
true
Architectures
- star topology (supervisor + workers)
- delegator-based multi-agent pipeline
Collaboration
- supervisor/Delegator mediates all communication
- workers unaware of each other
Reproducibility
Open Source Status
- unknown
Risks & Boundaries
Limitations
- No formal quantitative evaluation or benchmark results are reported.
- Excel writer is hard-coded and not LLM-driven, limiting flexibility.
- Models used are unnamed and may change, making reproducibility unclear.
- Authors note hallucination risk and rely on human/agent fact-checking.
When Not To Use
- For tasks needing provable correctness or regulatory guarantees.
- When you cannot supply any project-specific documents for indexing.
- If you require fully open-source, reproducible pipelines (code not provided).
Failure Modes
- LLM hallucinations leading to incorrect scenarios despite fact-checker.
- Supervisor misordering or incorrect prompts causing worker errors.
- Poor retrieval quality from the DB yields irrelevant or missing facts.
- Context window limits require careful block processing and note management.
Core Entities
Models
- on-premise and cloud LLMs (unnamed)
Context Entities
Models
- GPT-3.5 (related work)
- GPT-4 (related work)
- LLaMA, Mistral (related literature)

