Overview
SPML is a practical, deployable pre-filter with a released dataset and code; results are strong on the paper's benchmarks but rely on language-model-based type checking and a specific analyzer setup.
Citations2
Evidence Strength0.60
Confidence0.85
Risk Signals9
Trust Signals
Findings with numeric evidence: 3/3
Findings with evidence refs: 3/3
Results with explicit delta: 3/3
Reproducibility
Status: Code + data available
Open source: Yes
At A Glance
Cost impact: 60%
Production readiness: 60%
Novelty: 70%
Why It Matters For Business
SPML provides a lightweight, rule-like front door that blocks many prompt-injection attacks before they reach costly LLM calls, reducing risk and operating cost for deployed chatbots.
Who Should Care
Summary TLDR
The paper introduces SPML, a domain-specific language for writing chatbot system prompts in a structured way, plus a compiler that emits a natural-language prompt and an intermediate form (SPML-IR) used to detect prompt-injection attacks before user text reaches the LLM. The authors release a dataset of 1,871 system prompts and ~20k labeled user inputs, and report that SPML’s IR-based monitoring reduces attacker-miss rates compared with baseline LLM checks (e.g., lower error on many malicious classes versus GPT-3.5/GPT-4 on evaluated benchmarks).
Problem Statement
Deployed chatbots rely on a fixed system prompt to constrain behavior but are vulnerable to prompt-injection attacks. Existing studies measure vulnerability but do not provide a practical, deployable way to write robust system prompts and block malicious user inputs before they reach the LLM.
Main Contribution
SPML: a domain-specific language for writing system prompts with types, single-assignment, and small PL features to reduce ambiguity.
SPML-IR: a deterministic intermediate form used to fill a prompt skeleton from user input and detect conflicting assignments as injection.
Key Findings
SPML yields lower attacker-miss error on jailbreak attacks than GPT-4 on the paper's benchmark.
SPML reduces unsafe-interaction error compared to GPT-4 on the paper's system-prompt dataset.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Error Rate (safe interactions) | SPML 9.95% | GPT-4 3.12% | GPT-3.5 6.07% | LLaMA-13B 24.83% | GPT-4 (best baseline for safe interactions) | SPML higher false positive than GPT-4 by 6.83pts | Paper's system-prompt dataset (Section 8, Table 1) | Table 1 reports safe-interaction ERs | Table 1 |
| Error Rate (unsafe interactions) | SPML 10.09% | GPT-3.5 11.68% | GPT-4 27.57% | GPT-3.5 | SPML ~1.6 pts lower than GPT-3.5; ~17.5 pts lower than GPT-4 | Paper's system-prompt dataset (Section 8, Table 1) | Table 1 shows unsafe-interaction ERs | Table 1 |
What To Try In 7 Days
Run SPML compiler on one critical chatbot’s system prompt and test with the provided SPML dataset to find immediate vulnerabilities.
Add SPML-IR based pre-filtering in front of an LLM API to reject high-risk inputs and measure reduction in attacker slip-through and API calls.
Convert a handful of system-prompt rules (tone, name, scope) to SPML to see how many ambiguous/misleading NL instructions get resolved.
Optimization Features
Token Efficiency
System Optimization
Reproducibility
Risks & Boundaries
Limitations
Developers must write system prompts in SPML; existing natural-language prompts need manual conversion.
Security analyzer uses GPT-3.5 in experiments; this introduces non-zero false positives and depends on the analyzer model.
When Not To Use
If you cannot modify existing system prompts or pipeline to insert SPML compilation.
When attackers exploit non-text channels (images/audio) not covered by text-only monitoring.
Failure Modes
High false positives from analyzer block legitimate user requests, hurting UX.
An adaptive attacker crafts inputs that fill the IR skeleton with plausible but malicious values.

