Overview
The idea is practically focused: unify formats into a reversible sequence and run learned, budgeted selection. Empirical gains are shown across multiple benchmarks with detailed ablations. Results are robustly reported but depend on SFT, careful calibration of sufficiency thresholds, and engineering of HSEQ.
Citations0
Evidence Strength0.80
Confidence0.80
Risk Signals11
Trust Signals
Findings with numeric evidence: 3/4
Findings with evidence refs: 4/4
Results with explicit delta: 8/8
Reproducibility
Status: Code + data available
Open source: Partial
At A Glance
Cost impact: 60%
Production readiness: 60%
Novelty: 70%
Why It Matters For Business
RELOOP reduces wasted retrieval work and provides explicit provenance. This yields more accurate multi-step answers across mixed data formats while keeping latency and token/tool costs predictable.
Who Should Care
Summary TLDR
RELOOP turns retrieval into short, guided iterations over a single, reversible hierarchical sequence that encodes text, tables and knowledge-graph triples. A lightweight planner (head) gives a retrieval prior, a trained iterator picks small windows of segments and predicts when evidence is sufficient, and a canonicalizer packages provenance for the answerer. The system improves multi-hop accuracy across text/table/KG QA while keeping short, predictable loops and explicit evidence provenance.
Problem Statement
Current RAG pipelines either run a single big retrieval step that misses multi-step evidence chains, or use unconstrained agentic loops that explode tool/token costs and lack a clear stop rule. Different data formats (text, tables, KGs) also force separate retrievers and controllers, complicating deployment and audit.
Main Contribution
HSEQ: a reversible hierarchical sequence that linearizes text, table rows, and KG triples into typed segments with parent pointers and offsets for provenance.
A budget-aware iterative iterator (RELOOP-I) that selects small windows of segments, expands structure-aware neighborhoods, and predicts a sufficiency stop signal.
Key Findings
RELOOP yields higher QA accuracy than strong baselines across heterogeneous benchmarks.
Guided, budgeted iteration keeps loops short while retaining multi-hop power.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Accuracy | 66.4 | best RAG baselines (HippoRAG 65.8) | +0.6pp vs top baselines | HybridQA test | Table 2: RELOOP (best) achieves 66.4 acc | Table 2 |
| HybridQA F1 | 72.1 | HippoRAG 72.4 (slightly higher) | -0.3pp vs top F1 | HybridQA test | Table 2: RELOOP F1 72.1 | Table 2 |
What To Try In 7 Days
Convert a small mixed corpus to HSEQ: record paragraphs, table rows, and triples with simple offsets and parent pointers.
Fine-tune a small iterator LLM via LoRA to emit compact JSON actions and a sufficiency score over a held-out dev set.
Add a tiny planner that emits short, cached guidance templates and test iteration depth vs accuracy to pick a production pair.
Agent Features
Memory
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Collaboration
Optimization Features
Token Efficiency
Infra Optimization
Model Optimization
System Optimization
Training Optimization
Inference Optimization
Reproducibility
Data URLs
Risks & Boundaries
Limitations
Sufficiency judge can fail under noisy or partial evidence; authors note hallucination risk in the sufficiency head.
Framework assumes HSEQ metadata (offsets, row indices, triple fields) is available—costly to produce for arbitrary corpora.
When Not To Use
For trivial single-hop queries where LLM-only or single-pass retrieval is enough and latency is critical.
When you cannot construct reversible segment metadata (offsets, schema, parent pointers).
Failure Modes
Sufficiency false positives cause premature stopping and wrong answers.
Poor guidance (cache miss or bad planner output) can force extra iterations or miss supporting evidence.

