Overview
Production Readiness
0.6
Novelty Score
0.7
Cost Impact Score
0.4
Citation Count
10
Why It Matters For Business
SELF-FAMILIARITY can reduce incorrect or fabricated outputs by blocking low-familiarity prompts before generation, improving customer trust and reducing downstream fact-checking costs.
Summary TLDR
This paper introduces SELF-FAMILIARITY, a zero-resource, pre-generation guard that checks whether an LLM is familiar with the concepts in an instruction and withholds answers if familiarity is low. It extracts concepts with NER, asks the model to explain each concept, masks that explanation, and uses constrained beam search to try to regenerate the concept. Per-concept probability scores are weighted and aggregated into an instruction-level familiarity score. Evaluated on four open models and a new Concept-7 dataset, SELF-FAMILIARITY yields substantially higher AUC/accuracy than baselines and flags unfamiliar instructions before the model produces possibly hallucinated text.
Problem Statement
Large language models can confidently produce fabricated facts (hallucinations). Existing detectors work after generation or rely on external knowledge, making them reactive, brittle to prompt style, or unavailable in zero-resource settings. We need a proactive, zero-resource way to stop the model from answering on topics it likely does not know.
Main Contribution
SELF-FAMILIARITY: a zero-resource, pre-generation self-evaluation that flags instructions with unfamiliar concepts to prevent hallucinations.
A three-step pipeline: concept extraction (NER + grouping/filtering), concept guessing (explain then mask and constrained-beam recover), and weighted aggregation by word-frequency importance.
A new evaluation dataset, Concept-7 (192 basic concepts; 515 test instructions) and experiments on four open LMs showing consistent gains vs parameter- and prompt-based baselines.
Human and GPT-4 annotations used to build familiarity labels and thresholds; ablations show each processing step helps.
Key Findings
SELF-FAMILIARITY outperforms baselines on hallucinatory-instruction classification.
Performance is consistent across model styles.
Concept Guessing alone is very strong in controlled settings.
Results
SELF-FAMILIARITY on hallucinatory-instruction classification (Vicuna-13b-v1.3)
SELF-FAMILIARITY across models (AUC range)
Concept-only evaluation (Vicuna-13b-v1.3)
Who Should Care
What To Try In 7 Days
Run the three-step pipeline (NER → explain & mask → constrained beam search) on one model and flag low-familiarity prompts.
Set a threshold using a small set of known concepts and bootstrap intervals as in the paper.
If flagged, either withhold the automated reply or trigger a retrieval step to gather background knowledge before answering.
Agent Features
Tool Use
- Constrained beam search for controlled generation
Optimization Features
Infra Optimization
- Requires beam search and max-prob decoding; needs GPUs for speed
Inference Optimization
- Uses constrained beam search; higher compute at inference
Reproducibility
Code Available
Data Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Requires constrained beam search and access to generation probabilities; does not work with API-only models that hide decoding control.
- Constrained search with beam size 30 and masking increases inference cost and latency.
- NER + Wiktionary heuristics can miss or mis-group concepts in noisy instructions.
- Concept-7 is limited in size and uses fabricated concepts to balance unfamiliar examples.
When Not To Use
- Low-latency production paths where beam search is too slow.
- Black-box API models without constrained decoding controls.
- Use-cases that require full external knowledge access rather than internal familiarity checks.
Failure Modes
- False negatives when model is familiar but expresses the concept in a different phrasing that constrained search misses.
- False positives when NER splits or filters a concept improperly, changing the evaluated concept.
- Flags unfamiliar but relevant domain concepts that could be answered after a quick retrieval step; the guard needs integration with retrieval to recover.
Core Entities
Models
- Vicuna-13b-v1.3
- Falcon-7b-instruct
- mpt-7b-instruct
- Alpaca-7b
Metrics
- AUC
- ACC
- F1
- Pearson
Datasets
- Concept-7

