ProbDPP: pick diverse data that’s also likely to arrive — and learn reliabilities online

January 31, 20267 min

Overview

Production Readiness

0.6

Novelty Score

0.6

Cost Impact Score

0.5

Citation Count

0

Authors

Ahmad Sarlak, Abolfazl Razi

Links

Abstract / PDF

Why It Matters For Business

When some data sources are unreliable, selecting only diverse items can backfire. ProbDPP improves downstream QA and prompt quality by preferring items that are both diverse and likely to be available, reducing wasted context budget under noisy links or flaky tools.

Summary TLDR

Selecting diverse inputs for LLM prompts or fine-tuning fails when some sources randomly drop. The paper proves naive expected log-det diversity collapses under Bernoulli dropouts, proposes ProbDPP (a minimally regularized k‑DPP) that adds a per-item reliability reward, and gives a KL‑UCB semi-bandit algorithm to learn unknown source reliabilities online. Theory gives matching regret bounds and simulations (MeetingBank, HotpotQA) show consistent gains under stochastic unavailability.

Problem Statement

Diversity-based subset selection (e.g., k‑DPP) assumes selected items are always available. In realistic pipelines sources can drop or be corrupted randomly. The straightforward expected log-det under Bernoulli dropouts is mathematically ill-posed (diverges to -∞) and cannot guide selection. We need a diversity objective that stays finite under random dropouts and a practical method to select when reliabilities are unknown.

Main Contribution

Proof that expected log-det under independent Bernoulli dropouts is ill-posed (diverges to -∞) when any chosen item can fail

ProbDPP: a regularized k‑DPP objective that decomposes into geometric diversity (log-det) plus an additive per-item reliability reward

An online KL‑UCB-style combinatorial semi-bandit algorithm to learn unknown source success probabilities and select reliable, diverse subsets

Theoretical regret upper bound for the algorithm and an information-theoretic lower bound showing logarithmic regret is unavoidable

Key Findings

Naive expected log-det collapses under independent Bernoulli dropouts.

Regularizing the masked kernel by ε>0 yields a finite expected objective that splits into log-det diversity plus per-item reliability rewards.

An online KL‑UCB combinatorial semi-bandit algorithm achieves logarithmic regret in T (order-optimal up to gaps and constants).

On MeetingBank QA with dropout, ProbDPP outperforms compression baselines across metrics.

NumbersBERTScore 36.9 vs 32.5 (13.5% rel); Token-F1 31.3 vs 28.8

On HotpotQA (N=10, K=3), ProbDPP improves token overlap and exact match vs diversity-only and reliability-only baselines.

NumbersToken-F1 34.14 vs k‑DPP 31.20 (+9.4% rel); EM 25 vs k‑DPP 24

Results

Token-F1

Value31.3 (ProbDPP)

BaselineLLMLingua2 28.8

ROUGE-L

Value31.2 (ProbDPP)

BaselineLLMLingua2 28.8

BERTScore

Value36.9 (ProbDPP)

BaselineLLMLingua2 32.5

Exact Match (EM)

Value19.4 (ProbDPP)

BaselineLLMLingua2 17.3

Token-F1

Value34.14 (ProbDPP)

Baselinek-DPP 31.20

ROUGE-L

Value34.14 (ProbDPP)

Baselinek-DPP 31.34

BERTScore

Value11.27 (ProbDPP)

Baselinek-DPP 7.67

Exact Match (EM)

Value25 (ProbDPP)

Baselinek-DPP 24

Who Should Care

What To Try In 7 Days

Add a small ridge ε to your masked similarity kernel and re-evaluate existing DPP-based selection

Measure per-source availability (success/failure) and plug empirical rates into the reliability term r_i(α_i,ε)

If reliabilities are unknown, run the ProbDPP KL‑UCB loop to learn reliabilities while selecting under budget

Optimization Features

Token Efficiency

  • Prompt compression (context pruning)

Training Optimization

  • Data-efficient Training

Inference Optimization

  • Context Selection
  • Token Budgeting

Reproducibility

Data Available

Open Source Status

  • unknown

Risks & Boundaries

Limitations

  • Assumes independent Bernoulli dropouts; does not handle correlated or adversarial failures
  • Uses a fixed similarity kernel; real systems may need query-dependent kernels
  • Binary per-item feedback is assumed; many pipelines only give aggregate quality signals
  • Stability parameter ε controls a sensitivity trade-off; adaptive tuning is not provided

When Not To Use

  • If failures are highly correlated or adversarial (violates independence assumption)
  • If you only get aggregate/episodic feedback (no per-item semibandit signals)
  • When all sources are reliable (α≈1) and diversity-only methods suffice
  • If you cannot compute or trust per-item embeddings and a fixed Gram matrix

Failure Modes

  • Objective collapse (-∞ log-det) if regularization ε is omitted and items can drop
  • Wrong reliability estimates cause persistent suboptimal selection during learning
  • Adversarial or correlated outages can break the independence-based guarantees
  • Poor choice of ε may under- or over-penalize unreliable items

Core Entities

Models

  • llama3
  • k-DPP
  • ProbDPP

Metrics

  • Token-F1
  • ROUGE-L
  • BERTScore
  • Exact Match (EM)

Datasets

  • MeetingBank
  • HotpotQA (distractor)

Benchmarks

  • HotpotQA distractor
  • MeetingBank long-context QA