Overview
Production Readiness
0.6
Novelty Score
0.6
Cost Impact Score
0.65
Citation Count
0
Why It Matters For Business
DUAL cuts labeling waste by choosing representative but model-informative documents, improving robustness and lowering selection compute compared to full uncertainty methods.
Summary TLDR
DUAL is a simple active-learning method for abstractive summarization that first picks a small diverse set of candidate documents (via embeddings), then ranks those by model uncertainty (BLEU variance with MC dropout), discards extreme-noise candidates, and mixes in random samples. Across 3 summarization models and 4 datasets, DUAL usually matches or improves over pure uncertainty, pure diversity, and random sampling while selecting fewer outliers and lowering sample-selection time versus full uncertainty-based selection. Code and datasets are public.
Problem Statement
Modern summarization models can reach strong performance with small labeled sets, so choosing which documents to label matters. Existing active-learning methods focus on either uncertainty (risking noisy samples) or diversity (risking limited exploration). For summarization these approaches are inconsistent and often beaten by random sampling.
Main Contribution
DUAL algorithm: combine in-domain diversity (IDDS) with uncertainty (BLEUVar via MC dropout), plus random sampling and an exclusion set to avoid oversampling regions.
Large empirical study: 3 models (BART, PEGASUS, FLAN-T5) on 4 datasets (AESLC, Reddit TIFU, WikiHow, BillSum) with repeated runs (6 seeds) and ROUGE evaluation.
Analysis and visualizations: show why IDDS can get stuck, why uncertainty alone picks outliers, and how DUAL balances diversity and robustness.
Public code and reproducible setup (embeddings, TSDAE domain adaptation, hyperparameters) shared on GitHub.
Key Findings
DUAL frequently matches or yields the best ROUGE-1 among compared strategies on evaluated benchmarks.
DUAL reduces selection of outliers while keeping diversity compared to random or uncertainty-only selection.
DUAL reduces sample-selection time compared to full uncertainty (BAS) by applying MC dropout only on IDDS top-k candidates.
Pure diversity (IDDS) sometimes gets stuck in one embedding region and can hurt learning later.
Results
ROUGE-1 (example)
ROUGE-1 (example)
ROUGE-1 (counterexample)
Selection time
Selection time
Who Should Care
What To Try In 7 Days
Reproduce DUAL on your summarization task with B=150 and s=10 to check whether labeling fewer, better examples helps.
Compute domain-adapted embeddings (TSDAE) once and use IDDS top-k to limit expensive uncertainty passes.
Tune the uncertainty cap τ to filter noisy candidates and add p≈0.1–0.3 random samples per iteration for exploration.
Optimization Features
Training Optimization
- Data-efficient selection via active learning
Inference Optimization
- Limit MC-dropout to IDDS top-k to cut selection cost
Reproducibility
Code Urls
Data Urls
Code Available
Data Available
Open Source Status
- yes
Risks & Boundaries
Limitations
- Performance depends on the quality of embeddings and TSDAE domain adaptation.
- BLEUVar is task-agnostic and may not capture factual or content-preservation uncertainty.
- Experiments use budgets up to B=150; behavior at much larger scales is untested.
- DUAL still needs upfront embedding computation and MC-dropout passes on candidates.
When Not To Use
- When you cannot compute domain embeddings or lack compute for any MC-dropout passes.
- When labeling budgets are extremely large and random sampling is already adequate.
- When uncertainty must be measured by specialized human metrics (factuality) not BLEUVar.
Failure Modes
- If IDDS embeddings are poor, DUAL may still focus on unrepresentative regions despite random sampling.
- If τ (uncertainty cap) is set incorrectly, algorithm may either include noisy samples or discard all candidates.
- Overspecialization: excluding top-k neighbors permanently (E set) can remove useful nearby samples in some domains.
Core Entities
Models
- BART
- PEGASUS
- FLAN-T5
- BERT (for embeddings)
- MPNet (evaluated but not used)
- TSDAE (embedding domain adaptation)
Metrics
- ROUGE-1
- ROUGE-2
- ROUGE-L
- BLEU variance (BLEUVar)
- Diversity score (avg Euclidean dist.)
- Outlier score (KNN density)
- Sample selection time (s)
Datasets
- AESLC
- Reddit TIFU (long)
- WikiHow
- BillSum
Benchmarks
- ROUGE
- BLEUVar

