Hierarchical Agentic RAG: small LMs + prompt pools to boost forecasting, anomaly detection, and imputation

August 18, 20248 min

Overview

Decision SnapshotNeeds Validation

The method shows consistent gains on public benchmarks using concrete recipes (QLoRA, DPO, prompt pools), but it requires moderate compute for fine-tuning and careful prompt-pool construction; generalization beyond traffic and listed industrial datasets is promising but not proven.

Citations4

Evidence Strength0.70

Confidence0.75

Risk Signals11

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 4/4

Reproducibility

Status: Partial assets available

Open source: Partial

At A Glance

Cost impact: 50%

Production readiness: 50%

Novelty: 60%

Authors

Chidaksh Ravuru, Sagar Srinivas Sakhinana, Venkataramana Runkana

Links

Abstract / PDF / Data

Why It Matters For Business

A modular Agentic-RAG can reduce forecasting errors and improve anomaly detection on operational time-series (traffic, industrial telemetry), enabling better planning and faster incident detection while allowing independent updates to sub-modules.

Who Should Care

Summary TLDR

The paper introduces an agentic Retrieval-Augmented Generation (Agentic-RAG) system for time-series tasks. A master agent routes queries to task-specialized sub-agents; each sub-agent is a small pre-trained language model (Gemma or Llama variants) fine-tuned with instruction tuning and Direct Preference Optimization (DPO). Sub-agents retrieve relevant key-value ‘‘prompt pools’’ (historical pattern snippets) via cosine similarity and concatenate retrieved prompts with input before projection. Experiments on traffic and industrial benchmarks (PeMSD*, METR-LA, PEMS-BAY, SWaT, WADI, SMAP, MSL, TEP, HAI, ETT) show consistent gains: for example PEMS-BAY horizon@3 RMSE drops to 1.62 (Agentic-RAG w/

Problem Statement

Time-series models struggle with high dimensionality, non-stationarity and the fixed-window assumption. Small pretrained language models can be cheaply adapted but lack time-series knowledge. Existing methods either use task-specific architectures or fixed-length history windows that fail under distribution shifts.

Main Contribution

Agentic-RAG: hierarchical master + specialized sub-agents that route tasks (forecasting, anomaly detection, imputation, classification).

Differentiable dynamic prompt pools: key-value prompt repositories that store distilled historical patterns and are retrieved by similarity.

Key Findings

Agentic-RAG reduces forecasting error on traffic benchmarks.

NumbersPEMS-BAY Horizon@3 RMSE 1.62 vs DGCRN 2.69 (Table 4)

Practical UseExpect substantially lower short-horizon RMSE on traffic-style datasets by combining small LMs with prompt pools and PEFT fine-tuning.

Evidence RefTable 4

Agentic-RAG improves anomaly detection F1 across industrial benchmarks.

NumbersSWaT F1 92.59% vs GRELEN 89.10% (Table 5)

Practical UseUse Agentic-RAG to increase anomaly detection recall/precision on telemetry-style datasets, reducing missed or false alarms.

Evidence RefTable 5

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Forecasting RMSE (PEMS-BAY horizon@3)1.62 (Agentic-RAG w/Llama-8B)DGCRN 2.69-1.07PEMS-BAY (Table 4)Agentic-RAG shows lower RMSE on evaluated traffic benchmarksTable 4
Anomaly detection F1 (SWaT)92.59%GRELEN 89.10%+3.49ppSWaT (Table 5)Agentic-RAG variants show higher precision/recall and F1 across anomaly datasetsTable 5

What To Try In 7 Days

Prototype a single sub-agent: fine-tune a small LM (Gemma or Llama-8B) on one time-series task using QLoRA and instruction tuning.

Build a small prompt pool of historical patterns (key vectors + value snippets) and implement top-K cosine retrieval to condition the model.

Run an ablation: compare model with and without prompt retrieval and with/without DPO to measure impact on your dataset.

Agent Features

Memory
differentiable prompt pools (retrieval memory: key-value prompts)
Planning
master agent orchestrates and routes taskssupports chaining sub-agents for multi-step tasks (not exercised here)
Tool Use
retrieval from prompt poolsReAct prompting for stepwise reasoningexternal tools implemented as sub-agents
Frameworks
ReActAgentic-RAG
Is Agentic

Yes

Architectures
hierarchical master + specialized sub-agents
Collaboration
sub-agents specialize by task and are orchestrated by master agent

Optimization Features

Token Efficiency
use of grouped/neighbor attention (SelfExtend) to extend context without full finetuning
Infra Optimization
NVIDIA GPUs; reporting of GPU-hours and carbon estimates
Model Optimization
LoRA
System Optimization
gradient accumulation and small batch sizes to fit GPUs
Training Optimization
instruction tuning with PEFTDirect Preference Optimization (DPO) for preference alignment
Inference Optimization
SelfExtend long-context technique to handle longer inputs

Reproducibility

Code AvailableNo
Data AvailableYes
Open Source StatusPartial
LicenseUnknown

Data URLs

PeMSD* (PeMS datasets)METR-LAPEMS-BAYSWaT, WADI, SMAP, MSL, TEP, HAIETT (ETTh1/ETTh2/ETTm1/ETTm2)

Risks & Boundaries

Limitations

Needs substantial fine-tuning and prompt-pool construction effort per task/dataset.

Performance degrades as missing-rate increases; >30–50% missingness reduces accuracy notably.

When Not To Use

When latency or model size is strictly limited (real-time edge with tiny compute).

When datasets are extremely sparse (>50% missing) without strong external context.

Failure Modes

Wrong or irrelevant prompts retrieved leading to biased or incorrect outputs.

Overfitting to prompt pool patterns and failing on unseen regime shifts.

Core Entities

Models

Gemma-2BGemma-7BLlama-8B (Llama 3-8B / SelfExtend)SelfExtend long-context technique

Metrics

MAERMSEMAPEAccuracyPrecisionRecallF1-scoreFault Detection Rate (FDR)

Datasets

PeMSD3PeMSD4PeMSD7PeMSD7(M)PeMSD8METR-LAPEMS-BAYSWaTWADISMAPMSLTEP (Tennessee Eastman)HAIETTh1/ETTh2/ETTm1/ETTm2

Benchmarks

traffic forecasting (PeMS*, METR-LA, PEMS-BAY)multivariate anomaly detection (SWaT, WADI, SMAP, MSL, HAI, TEP)missing data imputationtime-series classification