ProbDPP: pick diverse data that’s also likely to arrive — and learn reliabilities online

January 31, 20267 min

Overview

Decision SnapshotNeeds Validation

Theoretical proofs and simulation experiments support the claims, but evaluations are limited to simulated dropouts, two datasets, and a single LLM; more real-world tests are needed before large-scale deployment.

Citations0

Evidence Strength0.70

Confidence0.85

Risk Signals12

Trust Signals

Findings with numeric evidence: 2/5

Findings with evidence refs: 5/5

Results with explicit delta: 8/8

Reproducibility

Status: Partial assets available

Open source: Unknown

At A Glance

Cost impact: 50%

Production readiness: 60%

Novelty: 60%

Authors

Ahmad Sarlak, Abolfazl Razi

Links

Abstract / PDF

Why It Matters For Business

When some data sources are unreliable, selecting only diverse items can backfire. ProbDPP improves downstream QA and prompt quality by preferring items that are both diverse and likely to be available, reducing wasted context budget under noisy links or flaky tools.

Who Should Care

Summary TLDR

Selecting diverse inputs for LLM prompts or fine-tuning fails when some sources randomly drop. The paper proves naive expected log-det diversity collapses under Bernoulli dropouts, proposes ProbDPP (a minimally regularized k‑DPP) that adds a per-item reliability reward, and gives a KL‑UCB semi-bandit algorithm to learn unknown source reliabilities online. Theory gives matching regret bounds and simulations (MeetingBank, HotpotQA) show consistent gains under stochastic unavailability.

Problem Statement

Diversity-based subset selection (e.g., k‑DPP) assumes selected items are always available. In realistic pipelines sources can drop or be corrupted randomly. The straightforward expected log-det under Bernoulli dropouts is mathematically ill-posed (diverges to -∞) and cannot guide selection. We need a diversity objective that stays finite under random dropouts and a practical method to select when reliabilities are unknown.

Main Contribution

Proof that expected log-det under independent Bernoulli dropouts is ill-posed (diverges to -∞) when any chosen item can fail

ProbDPP: a regularized k‑DPP objective that decomposes into geometric diversity (log-det) plus an additive per-item reliability reward

Key Findings

Naive expected log-det collapses under independent Bernoulli dropouts.

Practical UseDo not use plain expected log-det if selected items can fail; optimization will be ill-posed and meaningless.

Evidence RefLemma 3.1; Appendix A.1

Regularizing the masked kernel by ε>0 yields a finite expected objective that splits into log-det diversity plus per-item reliability rewards.

Practical UseAdd a small ridge ε to the masked kernel to retain geometric diversity while favoring dependable sources.

Evidence RefLemma 3.2; Section 3.1

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Token-F131.3 (ProbDPP)LLMLingua2 28.8+2.5 abs (+8.7% rel)MeetingBank (max 30 chunks)Table 1 reports Token-F1 on MeetingBankTable 1
ROUGE-L31.2 (ProbDPP)LLMLingua2 28.8+2.4 abs (+8.3% rel)MeetingBank (max 30 chunks)Table 1 reports ROUGE-L on MeetingBankTable 1

What To Try In 7 Days

Add a small ridge ε to your masked similarity kernel and re-evaluate existing DPP-based selection

Measure per-source availability (success/failure) and plug empirical rates into the reliability term r_i(α_i,ε)

If reliabilities are unknown, run the ProbDPP KL‑UCB loop to learn reliabilities while selecting under budget

Optimization Features

Token Efficiency
Prompt compression (context pruning)
Training Optimization
Data-efficient Training
Inference Optimization
Context SelectionToken Budgeting

Reproducibility

Code AvailableNo
Data AvailableYes
Open Source StatusUnknown
LicenseUnknown

Risks & Boundaries

Limitations

Assumes independent Bernoulli dropouts; does not handle correlated or adversarial failures

Uses a fixed similarity kernel; real systems may need query-dependent kernels

When Not To Use

If failures are highly correlated or adversarial (violates independence assumption)

If you only get aggregate/episodic feedback (no per-item semibandit signals)

Failure Modes

Objective collapse (-∞ log-det) if regularization ε is omitted and items can drop

Wrong reliability estimates cause persistent suboptimal selection during learning

Core Entities

Models

llama3k-DPPProbDPP

Metrics

Token-F1ROUGE-LBERTScoreExact Match (EM)

Datasets

MeetingBankHotpotQA (distractor)

Benchmarks

HotpotQA distractorMeetingBank long-context QA