Overview
Production Readiness
0.6
Novelty Score
0.6
Cost Impact Score
0.5
Citation Count
0
Why It Matters For Business
Fixed-question testing can hide or undercount model knowledge and lead to poor model choices; optimized, semantics-preserving prompt search reveals a model's true answerable range so teams can pick models that actually cover needed domain facts.
Summary TLDR
This paper argues fixed prompts give a shaky view of what a language model truly "knows." The authors define a model's "knowledge boundary" (what it can answer under any expression vs. what it cannot), and introduce PGDC: a prompt-optimization algorithm that searches the semantic neighborhood of a question to find an optimal prompt while keeping meaning. PGDC outperforms common baselines on multiple knowledge benchmarks, preserves semantics per human checks, and avoids inducing large amounts of fake (counterfactual) answers. The method needs access to model embeddings and generation probabilities and focuses on exposing unanswerable knowledge rather than measuring prompt-sensitive gray areas
Problem Statement
Current model evaluations feed fixed questions or a few paraphrases to LLMs. Because LLMs are sensitive to wording, this yields unreliable and unstable estimates of what a model knows. The paper aims to reduce this randomness by searching for an "optimal" prompt (keeps the same meaning) to map what knowledge is inside vs. outside a model's capability boundary.
Main Contribution
Define "knowledge boundary": Prompt-agnostic, Prompt-sensitive, and Unanswerable knowledge classes.
Propose PGDC, a projected gradient descent algorithm with semantic and projection constraints to search optimal prompts.
Show PGDC uncovers broader knowledge boundaries than standard zero/few-shot baselines, with human checks and counterfactual robustness tests.
Key Findings
PGDC finds more answerable items than standard prompting on common-knowledge benchmarks.
PGDC preserves original meaning in most cases according to human judges.
PGDC is far less likely to induce fake answers on counterfactual data than an adversarial prompt method.
Results
success rate (constructing prompts that elicit correct answers)
semantic preservation rate (human annotators)
counterfactual induction rate (CFACT)
MMLU cloze-style scores (domain coverage)
Who Should Care
What To Try In 7 Days
Run PGDC-style prompt search on 100 mission-critical queries to map your model's knowledge boundary.
Compare PGDC results to zero/few-shot to see which queries are prompt-sensitive.
Validate 50 optimized prompts with quick human checks to confirm semantics preserved (80%+ target).
Reproducibility
Code Available
Data Available
Open Source Status
- yes
Risks & Boundaries
Limitations
- PGDC seeks optimal prompts near the original question and only reports unanswerable vs answerable; it does not quantify prompt-sensitive gray areas.
- The method requires access to model embeddings and generation probabilities; black-box APIs may be hard to use.
- Optimized prompts can still change semantics occasionally, especially for weaker models like GPT-2.
When Not To Use
- When you only have a black-box API without embeddings or logits.
- When you cannot afford iterative optimization costs for many queries.
- If you need a quantitative measure of prompt sensitivity rather than a binary boundary.
Failure Modes
- Projection maps embeddings to tokens that subtly change meaning.
- Optimization overfits to model idiosyncrasies or dataset artifacts.
- Malicious actors could abuse prompt-search to force false outputs.
Core Entities
Models
- LLaMA2
- Vicuna
- GPT-J
- GPT-2
- Mistral
Metrics
- success rate
- semantic preservation rate
- counterfactual induction rate
Datasets
- KAssess
- PARAREL
- COUNTERFACT
- ALCUNA
- MMLU
Benchmarks
- PARAREL
- KAssess
- CFACT
- ALCUNA
- MMLU (cloze)

