Overview
Paper provides a concrete method, datasets and Azure-based evaluations. Results are consistent across uni- and cross-domain tests but code and public data release are pending, and exact numeric improvements are shown only in figures.
Citations0
Evidence Strength0.70
Confidence0.80
Risk Signals9
Trust Signals
Findings with numeric evidence: 2/3
Findings with evidence refs: 3/3
Results with explicit delta: 0/4
Reproducibility
Status: No open assets linked
Open source: Partial
At A Glance
Cost impact: 60%
Production readiness: 60%
Novelty: 60%
Why It Matters For Business
If product support queries span multiple products, probabilistic federated retrieval increases correct-document retrieval and improves answer quality without per-product LLM finetuning.
Who Should Care
Summary TLDR
The paper introduces MKP-QA, a multi-product RAG system that combines a learned domain router, stochastic gating, and a dense bi-encoder retriever to federate search across product domains. The authors also build Adobe-focused uni- and cross-product datasets (AEP, Target, CJA). MKP-QA consistently outperforms baselines in top-1 retrieval accuracy and in LLM-judged relevancy and faithfulness on these datasets, with larger gains for cross-domain queries. Datasets and deployment notes are provided; code and public data release are pending Adobe approval.
Problem Statement
Enterprise product questions often span multiple products and require cross-product knowledge. Existing RAG pipelines either search every domain (slow, more hallucination) or pick one domain (can miss cross-product info). There is also no suitable public benchmark for multi-product product QA.
Main Contribution
MKP-QA: a probabilistic federated RAG pipeline that combines a learned query-domain router, stochastic gating for exploration-exploitation, and a dense bi-encoder retriever to rank documents across product domains.
A stochastic gating mechanism that samples domains based on router likelihoods and adaptive entropy-based thresholds to reduce selection errors and enable exploration.
Key Findings
MKP-QA outperforms baselines on retrieval and response quality.
Large synthetic dataset per product was created with GPT-4 assistance.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| SLA dataset size per product | AEP 28,860; CJA 27,820; Target 29,610 query-doc pairs | — | — | SLA uni-domain | Table 1 (Section 4.4) | Table 1 |
| % positive pairs (SLA) | AEP 17.53%; CJA 18.28%; Target 20.26% | — | — | SLA uni-domain | Table 1 (Section 4.4) | Table 1 |
What To Try In 7 Days
Run a small federated retrieval prototype: train a domain router and a Sentence-BERT retriever on existing product docs, compare top-1 retrieval against unified search.
Implement entropy-based adaptive gating to allow low-confidence domains to be sampled and measure cross-product recall lift.
Use GPT-4 (or internal judge) to cheaply evaluate relevancy and faithfulness on a held-out sample before full deployment.
Agent Features
Tool Use
Optimization Features
Infra Optimization
System Optimization
Training Optimization
Inference Optimization
Reproducibility
Risks & Boundaries
Limitations
Dataset and code release are pending Adobe approval, so exact replication is currently limited.
Performance depends on quality of domain router; misclassification can still remove needed domains despite stochastic gating.
When Not To Use
If you cannot afford vector DB or offline embedding infrastructure for retrieval at scale.
If queries are strictly single-domain and a simple index yields sufficient accuracy.
Failure Modes
Router assigns near-zero probability to relevant domain and gating fails to sample it, causing missed evidence.
Too many active domains (low threshold) increases latency and may introduce irrelevant context that hurts LLM faithfulness.

