Overview
Production Readiness
0.6
Novelty Score
0.55
Cost Impact Score
0.75
Citation Count
0
Why It Matters For Business
CachePrune reduces indirect prompt-injection risk with minimal compute and no change to prompts or extra LLM calls, protecting production LLM apps while keeping answer quality.
Summary TLDR
CachePrune finds neurons in a prompt's transformer KV cache that make the model treat context as instructions, then masks (prunes) those neurons so context is used only as data. It needs only a few samples (default N=8) and prunes a tiny fraction of neurons (default p=0.5%). On tested models and QA datasets it cuts attack success rates from tens of percent to low single digits while keeping answer quality nearly unchanged, and it does not require extra formatting or test-time LLM calls.
Problem Statement
LLMs can mistake context text for instructions and follow injected tasks (indirect prompt injection). Existing fixes either retrain models (costly) or change prompts/workflows (extra computation or worse quality). We need a lightweight defense that keeps original prompts and inference flow.
Main Contribution
CachePrune: identify and mask neurons in the context KV cache that trigger instruction-following.
A preferential attribution loss and selective thresholding to find task-triggering neurons with few samples and preserve answer quality.
Empirical evidence that masking the KV-cache neurons cuts attack success while preserving response quality and transfers across attacks and models.
Key Findings
CachePrune cuts attack success on LLaMA3-8B (SQuAD) from ~27.86% to ~7.44%.
On Mistral-7B (SQuAD) CachePrune reduces ASR to under 1%.
Answer quality is preserved after pruning on evaluated tasks.
CachePrune needs very few samples and prunes a tiny fraction of neurons.
Results
Attack Success Rate (ASR)
ASR
F1 (clean)
Sample efficiency
Who Should Care
What To Try In 7 Days
Run CachePrune on a copy of your cached contexts with N=8 and p=0.5% to measure ASR and quality.
Compare model outputs pre/post-mask on a small holdout of real prompts to verify no quality drop.
If you use cached contexts across many queries, apply the learned mask once and reuse it to avoid per-request overhead.
Agent Features
Memory
- KV cache intervention
Tool Use
- KV cache pruning
Architectures
- Transformer (KV cache)
Optimization Features
Token Efficiency
- mask applied to cached contexts once; no per-response token overhead
Model Optimization
- prune neuron activations in KV cache
System Optimization
- compatible with context caching to avoid repeated work
Training Optimization
- preferential attribution loss for sample-efficient attribution
Inference Optimization
- no extra LLM calls or prompt formatting at test time
Reproducibility
Data Available
Open Source Status
- unknown
Risks & Boundaries
Limitations
- Requires access to the model's KV cache and the exact token indices of the context span.
- Does not compare directly to heavy training-based defenses (finetuning) in compute-cost tradeoffs.
- Mask transfer is effective but not guaranteed for all attack types or unseen prompts.
- Pruning too large a neuron fraction can harm task performance (see ablation on p).
When Not To Use
- You cannot access or modify the model's KV cache (closed API, no activation access).
- You can afford large-scale finetuning or specialized training defenses and prefer training-time fixes.
- Context boundaries are unknown or dynamic and cannot be reliably marked.
Failure Modes
- Pruning a large fraction of neurons degrades clean-task quality.
- Adaptive attackers could potentially find token sequences that re-trigger poisoned outputs in some models.
- Learned masks may not generalize to very different prompt templates or unseen injection styles.
Core Entities
Models
- LLaMA3-8B
- Mistral-7B-Instruct-V3.0
- Phi-3.5-mini-instruct
Metrics
- ASR
- F1
- ROUGE
- BERTScore
- GPT-Score
Datasets
- SQuAD
- HotpotQA
- WildChat

