Hierarchical Agentic RAG: small LMs + prompt pools to boost forecasting, anomaly detection, and imputation

Overview

Decision SnapshotNeeds Validation

The method shows consistent gains on public benchmarks using concrete recipes (QLoRA, DPO, prompt pools), but it requires moderate compute for fine-tuning and careful prompt-pool construction; generalization beyond traffic and listed industrial datasets is promising but not proven.

Citations4

Evidence Strength0.70

Confidence0.75

Risk Signals11

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 4/4

Reproducibility

Status: Partial assets available

Open source: Partial

At A Glance

Cost impact: 50%

Production readiness: 50%

Novelty: 60%

Authors

Chidaksh Ravuru, Sagar Srinivas Sakhinana, Venkataramana Runkana

Links

Abstract / PDF / Data

Why It Matters For Business

A modular Agentic-RAG can reduce forecasting errors and improve anomaly detection on operational time-series (traffic, industrial telemetry), enabling better planning and faster incident detection while allowing independent updates to sub-modules.

Who Should Care

CTO Product Manager ML Engineer Data Scientist Engineering Lead

Summary TLDR

The paper introduces an agentic Retrieval-Augmented Generation (Agentic-RAG) system for time-series tasks. A master agent routes queries to task-specialized sub-agents; each sub-agent is a small pre-trained language model (Gemma or Llama variants) fine-tuned with instruction tuning and Direct Preference Optimization (DPO). Sub-agents retrieve relevant key-value ‘‘prompt pools’’ (historical pattern snippets) via cosine similarity and concatenate retrieved prompts with input before projection. Experiments on traffic and industrial benchmarks (PeMSD*, METR-LA, PEMS-BAY, SWaT, WADI, SMAP, MSL, TEP, HAI, ETT) show consistent gains: for example PEMS-BAY horizon@3 RMSE drops to 1.62 (Agentic-RAG w/

Problem Statement

Time-series models struggle with high dimensionality, non-stationarity and the fixed-window assumption. Small pretrained language models can be cheaply adapted but lack time-series knowledge. Existing methods either use task-specific architectures or fixed-length history windows that fail under distribution shifts.

Main Contribution

Agentic-RAG: hierarchical master + specialized sub-agents that route tasks (forecasting, anomaly detection, imputation, classification).

Differentiable dynamic prompt pools: key-value prompt repositories that store distilled historical patterns and are retrieved by similarity.

Key Findings

Agentic-RAG reduces forecasting error on traffic benchmarks.

NumbersPEMS-BAY Horizon@3 RMSE 1.62 vs DGCRN 2.69 (Table 4)

Practical UseExpect substantially lower short-horizon RMSE on traffic-style datasets by combining small LMs with prompt pools and PEFT fine-tuning.

Evidence RefTable 4

Agentic-RAG improves anomaly detection F1 across industrial benchmarks.

NumbersSWaT F1 92.59% vs GRELEN 89.10% (Table 5)

Practical UseUse Agentic-RAG to increase anomaly detection recall/precision on telemetry-style datasets, reducing missed or false alarms.

Evidence RefTable 5

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
Forecasting RMSE (PEMS-BAY horizon@3)	1.62 (Agentic-RAG w/Llama-8B)	DGCRN 2.69	-1.07	PEMS-BAY (Table 4)	Agentic-RAG shows lower RMSE on evaluated traffic benchmarks	Table 4
Anomaly detection F1 (SWaT)	92.59%	GRELEN 89.10%	+3.49pp	SWaT (Table 5)	Agentic-RAG variants show higher precision/recall and F1 across anomaly datasets	Table 5

What To Try In 7 Days

Prototype a single sub-agent: fine-tune a small LM (Gemma or Llama-8B) on one time-series task using QLoRA and instruction tuning.

Build a small prompt pool of historical patterns (key vectors + value snippets) and implement top-K cosine retrieval to condition the model.

Run an ablation: compare model with and without prompt retrieval and with/without DPO to measure impact on your dataset.

Agent Features

Memory

differentiable prompt pools (retrieval memory: key-value prompts)

Planning

master agent orchestrates and routes taskssupports chaining sub-agents for multi-step tasks (not exercised here)

Tool Use

retrieval from prompt poolsReAct prompting for stepwise reasoningexternal tools implemented as sub-agents

Frameworks

ReActAgentic-RAG

Is Agentic

Yes

Architectures

hierarchical master + specialized sub-agents

Collaboration

sub-agents specialize by task and are orchestrated by master agent

Optimization Features

Token Efficiency

use of grouped/neighbor attention (SelfExtend) to extend context without full finetuning

Infra Optimization

NVIDIA GPUs; reporting of GPU-hours and carbon estimates

Model Optimization

LoRA

System Optimization

gradient accumulation and small batch sizes to fit GPUs

Training Optimization

instruction tuning with PEFTDirect Preference Optimization (DPO) for preference alignment

Inference Optimization

SelfExtend long-context technique to handle longer inputs

Reproducibility

Code AvailableNo

Data AvailableYes

Open Source StatusPartial

LicenseUnknown

Data URLs

PeMSD* (PeMS datasets)METR-LAPEMS-BAYSWaT, WADI, SMAP, MSL, TEP, HAIETT (ETTh1/ETTh2/ETTm1/ETTm2)

Risks & Boundaries

Limitations

Needs substantial fine-tuning and prompt-pool construction effort per task/dataset.

Performance degrades as missing-rate increases; >30–50% missingness reduces accuracy notably.

When Not To Use

When latency or model size is strictly limited (real-time edge with tiny compute).

When datasets are extremely sparse (>50% missing) without strong external context.

Failure Modes

Wrong or irrelevant prompts retrieved leading to biased or incorrect outputs.

Overfitting to prompt pool patterns and failing on unseen regime shifts.

Core Entities

Models

Gemma-2BGemma-7BLlama-8B (Llama 3-8B / SelfExtend)SelfExtend long-context technique

Metrics

MAERMSEMAPEAccuracyPrecisionRecallF1-scoreFault Detection Rate (FDR)

Datasets

PeMSD3PeMSD4PeMSD7PeMSD7(M)PeMSD8METR-LAPEMS-BAYSWaTWADISMAPMSLTEP (Tennessee Eastman)HAIETTh1/ETTh2/ETTm1/ETTm2

Benchmarks

traffic forecasting (PeMS*, METR-LA, PEMS-BAY)multivariate anomaly detection (SWaT, WADI, SMAP, MSL, HAI, TEP)missing data imputationtime-series classification

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

Agentic-RAG reduces forecasting error on traffic benchmarks.

Agentic-RAG improves anomaly detection F1 across industrial benchmarks.

Results

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Data URLs

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Benchmarks

You May Also Want to Read

Chemistry foundation models power structure-focused multimodal RAG inside hierarchical multi-agent workflows

Key finding

Create, customize, and run multi-step LLM agents from plain language — no code needed

Key finding

COMPASS: a multi-agent orchestration that uses RAG and an LLM-as-judge to enforce sovereignty, carbon-awareness, compliance, and ethics in实时

Key finding

AgentAuditor: memory‑augmented RAG + CoT that makes LLM evaluators reach human-level accuracy on agent safety

Key finding

Use multi-agent RAG plus a hybrid vector-graph memory to auto-generate traceable test plans and cases, cutting test-document work by ~85% in

Key finding