Find a model's true knowledge boundary by optimizing prompts that preserve meaning

Overview

Decision SnapshotNeeds Validation

PGDC is a practical, tested method that reliably expands measured knowledge but needs white‑box access (embeddings/logits) and moderate compute to run iterative projection and gradient updates.

Citations0

Evidence Strength0.70

Confidence0.85

Risk Signals9

Trust Signals

Findings with numeric evidence: 3/3

Findings with evidence refs: 3/3

Results with explicit delta: 3/4

Reproducibility

Status: Code + data available

Open source: Yes

At A Glance

Cost impact: 50%

Production readiness: 60%

Novelty: 60%

Authors

Xunjian Yin, Xu Zhang, Jie Ruan, Xiaojun Wan

Links

Abstract / PDF / Code / Data

Why It Matters For Business

Fixed-question testing can hide or undercount model knowledge and lead to poor model choices; optimized, semantics-preserving prompt search reveals a model's true answerable range so teams can pick models that actually cover needed domain facts.

Who Should Care

CTO Product Manager ML Engineer Data Scientist Founder

Summary TLDR

This paper argues fixed prompts give a shaky view of what a language model truly "knows." The authors define a model's "knowledge boundary" (what it can answer under any expression vs. what it cannot), and introduce PGDC: a prompt-optimization algorithm that searches the semantic neighborhood of a question to find an optimal prompt while keeping meaning. PGDC outperforms common baselines on multiple knowledge benchmarks, preserves semantics per human checks, and avoids inducing large amounts of fake (counterfactual) answers. The method needs access to model embeddings and generation probabilities and focuses on exposing unanswerable knowledge rather than measuring prompt-sensitive gray areas

Problem Statement

Current model evaluations feed fixed questions or a few paraphrases to LLMs. Because LLMs are sensitive to wording, this yields unreliable and unstable estimates of what a model knows. The paper aims to reduce this randomness by searching for an "optimal" prompt (keeps the same meaning) to map what knowledge is inside vs. outside a model's capability boundary.

Main Contribution

Define "knowledge boundary": Prompt-agnostic, Prompt-sensitive, and Unanswerable knowledge classes.

Propose PGDC, a projected gradient descent algorithm with semantic and projection constraints to search optimal prompts.

Key Findings

PGDC finds more answerable items than standard prompting on common-knowledge benchmarks.

NumbersLLaMA2 success: PGDC 71.36% vs P-few 66.95% vs zero 34.43%

Practical UseIf you want a less conservative estimate of a model's knowledge, run PGDC-style prompt search rather than a single fixed prompt.

Evidence RefTable 1 (success rates)

PGDC preserves original meaning in most cases according to human judges.

NumbersSemantic preservation: GPT-2 80.5%, GPT-J 85.1%, LLaMA2 83.3%, Vicuna 86.2%

Practical UseOptimized prompts are usually safe to treat as paraphrases; still sample-check outputs for weaker models.

Evidence RefSection 4.5 (manual evaluation results)

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
success rate (constructing prompts that elicit correct answers)	71.36%	P-few 66.95%	+4.41pp	Table 1 (first block, LLaMA2)	PGDC outperforms P-few and zero-shot on common-knowledge sets	Table 1
semantic preservation rate (human annotators)	83.3% (LLaMA2 average)	—	—	PARAREL (200 samples, 3 annotators)	Majority of optimized prompts judged semantically consistent	Section 4.5

What To Try In 7 Days

Run PGDC-style prompt search on 100 mission-critical queries to map your model's knowledge boundary.

Compare PGDC results to zero/few-shot to see which queries are prompt-sensitive.

Validate 50 optimized prompts with quick human checks to confirm semantics preserved (80%+ target).

Reproducibility

Code AvailableYes

Data AvailableYes

Open Source StatusYes

LicenseUnknown

Code URLs

https://github.com/pkulcwmzx/knowledge_boundary

Data URLs

https://github.com/pkulcwmzx/knowledge_boundary

Risks & Boundaries

Limitations

PGDC seeks optimal prompts near the original question and only reports unanswerable vs answerable; it does not quantify prompt-sensitive gray areas.

The method requires access to model embeddings and generation probabilities; black-box APIs may be hard to use.

When Not To Use

When you only have a black-box API without embeddings or logits.

When you cannot afford iterative optimization costs for many queries.

Failure Modes

Projection maps embeddings to tokens that subtly change meaning.

Optimization overfits to model idiosyncrasies or dataset artifacts.

Core Entities

Models

LLaMA2VicunaGPT-JGPT-2Mistral

Metrics

success ratesemantic preservation ratecounterfactual induction rate

Datasets

KAssessPARARELCOUNTERFACTALCUNAMMLU

Benchmarks

PARARELKAssessCFACTALCUNAMMLU (cloze)

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

PGDC finds more answerable items than standard prompting on common-knowledge benchmarks.

PGDC preserves original meaning in most cases according to human judges.

Results

What To Try In 7 Days

Reproducibility

Code URLs

Data URLs

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Benchmarks

You May Also Want to Read

Use all LLMs as judges: a fast, democratic way to rank models that matches human preference

Key finding

Judge with hidden states: use small models' internal vectors instead of prompting large LLMs

Key finding

DIBJUDGE: fine-tune judges to separate true quality signals from translation artifacts

Key finding

LLM judges favor 'new' and 'expert' labels but never admit it.

Key finding

Confuse the judge: a black-box method that labels LLM evaluations as high or low uncertainty

Key finding