Overview
Scores reflect a practical, engineered system with promising internal results but limited public code or dataset release to fully validate across enterprises.
Citations1
Evidence Strength0.60
Confidence0.80
Risk Signals10
Trust Signals
Findings with numeric evidence: 3/3
Findings with evidence refs: 3/3
Results with explicit delta: 3/3
Reproducibility
Status: No open assets linked
Open source: Unknown
At A Glance
Cost impact: 60%
Production readiness: 60%
Novelty: 50%
Why It Matters For Business
A weighted RAG plus self-evaluation can cut misdiagnoses and speed resolution on large enterprise knowledge bases, improving service SLAs and reducing human time-to-fix.
Who Should Care
Summary TLDR
This paper presents a practical Retrieval-Augmented Generation (RAG) system that assigns context-dependent weights to multiple enterprise data sources (product manuals, FAQs, guides, internal KBs), uses FAISS + all-MiniLM-L6-v2 for dense search, and validates outputs with a LLaMA-based self-evaluator. On the authors' enterprise dataset the full pipeline reaches 90.8% accuracy and 0.89 relevance versus 85.2%/0.75 for a standard (equal-weight) RAG and 76.1%/0.61 for keyword search. The design focuses on modular source weighting, threshold filtering to reduce hallucination, and a final self-check step; it is intended for single-agent troubleshooting services rather than multi-agent workflows.
Problem Statement
Enterprise troubleshooting needs fast, accurate answers from many scattered sources. Keyword search misses context and manuals; static RAG treats all sources equally. The result is slower, less precise fixes. The paper proposes a dynamically weighted RAG that prioritizes sources by query context and validates outputs to reduce hallucinations.
Main Contribution
A dynamic weighting mechanism that adjusts retrieval importance per data source based on query context (e.g., boost manuals for SKU queries).
A threshold-based filtering and multi-index aggregation pipeline over FAISS indices to reduce weak matches before generation.
Key Findings
Weighted RAG plus self-evaluation achieves higher troubleshooting accuracy than baselines
LLaMA-based self-evaluator improves correctness over standard RAG
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Accuracy | 90.8% | 85.2% (Standard RAG) | +5.6% | Author enterprise troubleshooting dataset | Table 1; Sec 5.1 | Table 1 |
| Relevance Score | 0.89 | 0.75 (Standard RAG) | +0.14 | Author enterprise troubleshooting dataset | Table 1; Sec 5.1 | Table 1 |
What To Try In 7 Days
Index your manuals, FAQs, and KBs into separate FAISS indices.
Prototype rule-based source weights (e.g., boost manuals for SKU queries).
Add per-index threshold filtering to drop weak matches before generation. Fine-tune thresholds empirically on a labeled sample.
Agent Features
Memory
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Optimization Features
Token Efficiency
Infra Optimization
System Optimization
Training Optimization
Inference Optimization
Reproducibility
Risks & Boundaries
Limitations
Dataset appears proprietary; results may not generalize to other enterprises.
Self-evaluation and generation use a 70B LLaMA model; cost and latency are high for many deployments.
When Not To Use
If you lack GPU capacity for large LLaMA inference and FAISS at scale.
When strict data locality or privacy rules forbid moving sensitive KBs into shared embeddings.
Failure Modes
Over-weighting one source can bias answers toward that source even if it's outdated.
Poor threshold settings can filter out the correct document or allow weak matches, harming accuracy.

