Overview
IDEAL is a practical selection method: it is unsupervised, easy to implement with embeddings, has provable greedy guarantees, and shows consistent empirical gains and large selection-time reductions versus prior baselines.
Citations0
Evidence Strength0.70
Confidence0.80
Risk Signals10
Trust Signals
Findings with numeric evidence: 4/4
Findings with evidence refs: 4/4
Results with explicit delta: 4/4
Reproducibility
Status: Code + data available
Open source: Partial
At A Glance
Cost impact: 70%
Production readiness: 60%
Novelty: 60%
Why It Matters For Business
Label fewer examples and get nearly the same or better in-context performance while cutting selection time and inference cost; this lowers annotation bills and speeds up prompt curation.
Who Should Care
Summary TLDR
IDEAL is an unsupervised method to choose which unlabeled examples to annotate so that those labeled examples serve as strong in-context prompts for large language models. It builds a directed similarity graph from embeddings, measures a candidate subset's reach via a diffusion (influence) model, and greedily picks examples with the largest marginal influence. IDEAL matches or beats prior selective-annotation baselines on 9 datasets (17/18 cases) while using roughly 13% of the subset-selection time of the prior method (≈7.8× speedup). The paper includes a provable greedy approximation bound and shows Auto-IDEAL (automatic label propagation) can further expand prompts cheaply.
Problem Statement
In-context learning needs many annotated prompts but manual annotation is costly. How do we choose a small subset to label that gives good prompts for many test inputs while minimizing annotation and selection costs?
Main Contribution
An unsupervised, end-to-end selective annotation method (IDEAL) that picks unlabeled examples to annotate by maximizing a graph-based influence metric.
A practical algorithm: build a directed k-NN graph on Sentence‑BERT embeddings, quantify subset influence via an independent-cascade diffusion, and greedily select items by marginal gain.
Key Findings
IDEAL outperforms Votek and random selection in most evaluations.
Subset selection time is much lower than prior work (Votek).
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Accuracy | IDEAL 66.4% | Votek 64.6% | +1.8 pp | MRPC | Table 1 (budget=100) | Table 1 |
| Accuracy | IDEAL 51.4% | Votek 46.6% | +4.8 pp | SST-5 | Table 1 (budget=100) | Table 1 |
What To Try In 7 Days
Compute Sentence-BERT embeddings for 3k unlabeled points and build a directed k-NN graph (k=10).
Run IDEAL's greedy influence selection to pick m examples, label them, then use similarity-based retrieval as prompts.
Compare prompt accuracy and selection compute against random selection and your current pipeline; measure selection time and token costs.
Optimization Features
Token Efficiency
Inference Optimization
Reduces selection-stage inference calls by avoiding LLM predictions over the unlabeled pool; reporte
Reproducibility
Code URLs
Risks & Boundaries
Limitations
Requires good embeddings: poor embedding quality harms the graph and selection.
Memory for LLM inference still large: loading a 6B model needs ≈23GB GPU memory.
When Not To Use
When you lack reliable sentence embeddings for your domain.
When you cannot afford any predictions for Auto-annotation but require expanded labels.
Failure Modes
Embedding bias selects semantically similar but label-skewed examples, reducing downstream accuracy on some classes.
Graph connectivity issues (isolated nodes) limit diffusion, causing poor influence estimates.

