Overview
Production Readiness
0.6
Novelty Score
0.7
Cost Impact Score
0.6
Citation Count
2
Why It Matters For Business
SPML provides a lightweight, rule-like front door that blocks many prompt-injection attacks before they reach costly LLM calls, reducing risk and operating cost for deployed chatbots.
Summary TLDR
The paper introduces SPML, a domain-specific language for writing chatbot system prompts in a structured way, plus a compiler that emits a natural-language prompt and an intermediate form (SPML-IR) used to detect prompt-injection attacks before user text reaches the LLM. The authors release a dataset of 1,871 system prompts and ~20k labeled user inputs, and report that SPML’s IR-based monitoring reduces attacker-miss rates compared with baseline LLM checks (e.g., lower error on many malicious classes versus GPT-3.5/GPT-4 on evaluated benchmarks).
Problem Statement
Deployed chatbots rely on a fixed system prompt to constrain behavior but are vulnerable to prompt-injection attacks. Existing studies measure vulnerability but do not provide a practical, deployable way to write robust system prompts and block malicious user inputs before they reach the LLM.
Main Contribution
SPML: a domain-specific language for writing system prompts with types, single-assignment, and small PL features to reduce ambiguity.
SPML-IR: a deterministic intermediate form used to fill a prompt skeleton from user input and detect conflicting assignments as injection.
A dataset of 1,871 system prompts plus ≈20k labeled user prompts (safe, unsafe, malicious) for evaluating chatbot prompt defenses.
An evaluation showing SPML often reduces attacker-miss/error rates compared to baseline LLM-based detectors (GPT-3.5, GPT-4, LLaMA variants).
Key Findings
SPML yields lower attacker-miss error on jailbreak attacks than GPT-4 on the paper's benchmark.
SPML reduces unsafe-interaction error compared to GPT-4 on the paper's system-prompt dataset.
The authors provide a sizable, multi-source benchmark for prompt-injection research.
Results
Error Rate (safe interactions)
Error Rate (unsafe interactions)
Error Rate (malicious: Jailbreak)
Who Should Care
What To Try In 7 Days
Run SPML compiler on one critical chatbot’s system prompt and test with the provided SPML dataset to find immediate vulnerabilities.
Add SPML-IR based pre-filtering in front of an LLM API to reject high-risk inputs and measure reduction in attacker slip-through and API calls.
Convert a handful of system-prompt rules (tone, name, scope) to SPML to see how many ambiguous/misleading NL instructions get resolved.
Optimization Features
Token Efficiency
- SPML rejects some malicious inputs before LLM call, saving API tokens
System Optimization
- Offline type-checking minimizes runtime cost
Reproducibility
Code Available
Data Available
Open Source Status
- yes
Risks & Boundaries
Limitations
- Developers must write system prompts in SPML; existing natural-language prompts need manual conversion.
- Security analyzer uses GPT-3.5 in experiments; this introduces non-zero false positives and depends on the analyzer model.
- SPML is not foolproof: complex or novel injection techniques may still bypass checks.
When Not To Use
- If you cannot modify existing system prompts or pipeline to insert SPML compilation.
- When attackers exploit non-text channels (images/audio) not covered by text-only monitoring.
- If you require zero false positives for safety-critical flows without human review.
Failure Modes
- High false positives from analyzer block legitimate user requests, hurting UX.
- An adaptive attacker crafts inputs that fill the IR skeleton with plausible but malicious values.
- Dependence on the analyzer LLM: model drift or version changes can change detection behavior.
Core Entities
Models
- GPT-4
- GPT-3.5
- LLAMA-7B
- LLAMA-13B
Metrics
- Error Rate (ER)
Datasets
- SPML dataset (1871 SPs, ~20k user prompts)
- Tensor-Trust
- Gandalf
Benchmarks
- SPML prompt-injection benchmark

