Overview
Production Readiness
0.6
Novelty Score
0.6
Cost Impact Score
0.5
Citation Count
0
Why It Matters For Business
When some data sources are unreliable, selecting only diverse items can backfire. ProbDPP improves downstream QA and prompt quality by preferring items that are both diverse and likely to be available, reducing wasted context budget under noisy links or flaky tools.
Summary TLDR
Selecting diverse inputs for LLM prompts or fine-tuning fails when some sources randomly drop. The paper proves naive expected log-det diversity collapses under Bernoulli dropouts, proposes ProbDPP (a minimally regularized k‑DPP) that adds a per-item reliability reward, and gives a KL‑UCB semi-bandit algorithm to learn unknown source reliabilities online. Theory gives matching regret bounds and simulations (MeetingBank, HotpotQA) show consistent gains under stochastic unavailability.
Problem Statement
Diversity-based subset selection (e.g., k‑DPP) assumes selected items are always available. In realistic pipelines sources can drop or be corrupted randomly. The straightforward expected log-det under Bernoulli dropouts is mathematically ill-posed (diverges to -∞) and cannot guide selection. We need a diversity objective that stays finite under random dropouts and a practical method to select when reliabilities are unknown.
Main Contribution
Proof that expected log-det under independent Bernoulli dropouts is ill-posed (diverges to -∞) when any chosen item can fail
ProbDPP: a regularized k‑DPP objective that decomposes into geometric diversity (log-det) plus an additive per-item reliability reward
An online KL‑UCB-style combinatorial semi-bandit algorithm to learn unknown source success probabilities and select reliable, diverse subsets
Theoretical regret upper bound for the algorithm and an information-theoretic lower bound showing logarithmic regret is unavoidable
Key Findings
Naive expected log-det collapses under independent Bernoulli dropouts.
Regularizing the masked kernel by ε>0 yields a finite expected objective that splits into log-det diversity plus per-item reliability rewards.
An online KL‑UCB combinatorial semi-bandit algorithm achieves logarithmic regret in T (order-optimal up to gaps and constants).
On MeetingBank QA with dropout, ProbDPP outperforms compression baselines across metrics.
On HotpotQA (N=10, K=3), ProbDPP improves token overlap and exact match vs diversity-only and reliability-only baselines.
Results
Token-F1
ROUGE-L
BERTScore
Exact Match (EM)
Token-F1
ROUGE-L
BERTScore
Exact Match (EM)
Who Should Care
What To Try In 7 Days
Add a small ridge ε to your masked similarity kernel and re-evaluate existing DPP-based selection
Measure per-source availability (success/failure) and plug empirical rates into the reliability term r_i(α_i,ε)
If reliabilities are unknown, run the ProbDPP KL‑UCB loop to learn reliabilities while selecting under budget
Optimization Features
Token Efficiency
- Prompt compression (context pruning)
Training Optimization
- Data-efficient Training
Inference Optimization
- Context Selection
- Token Budgeting
Reproducibility
Data Available
Open Source Status
- unknown
Risks & Boundaries
Limitations
- Assumes independent Bernoulli dropouts; does not handle correlated or adversarial failures
- Uses a fixed similarity kernel; real systems may need query-dependent kernels
- Binary per-item feedback is assumed; many pipelines only give aggregate quality signals
- Stability parameter ε controls a sensitivity trade-off; adaptive tuning is not provided
When Not To Use
- If failures are highly correlated or adversarial (violates independence assumption)
- If you only get aggregate/episodic feedback (no per-item semibandit signals)
- When all sources are reliable (α≈1) and diversity-only methods suffice
- If you cannot compute or trust per-item embeddings and a fixed Gram matrix
Failure Modes
- Objective collapse (-∞ log-det) if regularization ε is omitted and items can drop
- Wrong reliability estimates cause persistent suboptimal selection during learning
- Adversarial or correlated outages can break the independence-based guarantees
- Poor choice of ε may under- or over-penalize unreliable items
Core Entities
Models
- llama3
- k-DPP
- ProbDPP
Metrics
- Token-F1
- ROUGE-L
- BERTScore
- Exact Match (EM)
Datasets
- MeetingBank
- HotpotQA (distractor)
Benchmarks
- HotpotQA distractor
- MeetingBank long-context QA

