A small domain-specific language (SPML) that compiles strict chatbot specs and blocks prompt-injection attacks before they hit the LLM

Overview

Decision SnapshotNeeds Validation

SPML is a practical, deployable pre-filter with a released dataset and code; results are strong on the paper's benchmarks but rely on language-model-based type checking and a specific analyzer setup.

Citations2

Evidence Strength0.60

Confidence0.85

Risk Signals9

Trust Signals

Findings with numeric evidence: 3/3

Findings with evidence refs: 3/3

Results with explicit delta: 3/3

Reproducibility

Status: Code + data available

Open source: Yes

At A Glance

Cost impact: 60%

Production readiness: 60%

Novelty: 70%

Authors

Reshabh K Sharma, Vinayak Gupta, Dan Grossman

Links

Abstract / PDF / Code / Data

Why It Matters For Business

SPML provides a lightweight, rule-like front door that blocks many prompt-injection attacks before they reach costly LLM calls, reducing risk and operating cost for deployed chatbots.

Who Should Care

Product Manager CTO ML Engineer Engineering Lead Founder

Summary TLDR

The paper introduces SPML, a domain-specific language for writing chatbot system prompts in a structured way, plus a compiler that emits a natural-language prompt and an intermediate form (SPML-IR) used to detect prompt-injection attacks before user text reaches the LLM. The authors release a dataset of 1,871 system prompts and ~20k labeled user inputs, and report that SPML’s IR-based monitoring reduces attacker-miss rates compared with baseline LLM checks (e.g., lower error on many malicious classes versus GPT-3.5/GPT-4 on evaluated benchmarks).

Problem Statement

Deployed chatbots rely on a fixed system prompt to constrain behavior but are vulnerable to prompt-injection attacks. Existing studies measure vulnerability but do not provide a practical, deployable way to write robust system prompts and block malicious user inputs before they reach the LLM.

Main Contribution

SPML: a domain-specific language for writing system prompts with types, single-assignment, and small PL features to reduce ambiguity.

SPML-IR: a deterministic intermediate form used to fill a prompt skeleton from user input and detect conflicting assignments as injection.

Key Findings

SPML yields lower attacker-miss error on jailbreak attacks than GPT-4 on the paper's benchmark.

NumbersJailbreak ER: SPML 1.29% vs GPT-4 4.31% (Table 2)

Practical UseUsing SPML's IR-based check can cut missed jailbreaks by ~3x on evaluated examples; consider adding SPML-style pre-filters before an LLM in production.

Evidence RefTable 2

SPML reduces unsafe-interaction error compared to GPT-4 on the paper's system-prompt dataset.

NumbersUnsafe interactions ER: SPML 10.09% vs GPT-4 27.57% (Table 1)

Practical UseFor broad prompt-change attacks, SPML flags far fewer attacks that slip past GPT-4 on the tested dataset — useful when protecting many vertical bots.

Evidence RefTable 1

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
Error Rate (safe interactions)	SPML 9.95% \| GPT-4 3.12% \| GPT-3.5 6.07% \| LLaMA-13B 24.83%	GPT-4 (best baseline for safe interactions)	SPML higher false positive than GPT-4 by 6.83pts	Paper's system-prompt dataset (Section 8, Table 1)	Table 1 reports safe-interaction ERs	Table 1
Error Rate (unsafe interactions)	SPML 10.09% \| GPT-3.5 11.68% \| GPT-4 27.57%	GPT-3.5	SPML ~1.6 pts lower than GPT-3.5; ~17.5 pts lower than GPT-4	Paper's system-prompt dataset (Section 8, Table 1)	Table 1 shows unsafe-interaction ERs	Table 1

What To Try In 7 Days

Run SPML compiler on one critical chatbot’s system prompt and test with the provided SPML dataset to find immediate vulnerabilities.

Add SPML-IR based pre-filtering in front of an LLM API to reject high-risk inputs and measure reduction in attacker slip-through and API calls.

Convert a handful of system-prompt rules (tone, name, scope) to SPML to see how many ambiguous/misleading NL instructions get resolved.

Optimization Features

Token Efficiency

SPML rejects some malicious inputs before LLM call, saving API tokens

System Optimization

Offline type-checking minimizes runtime cost

Reproducibility

Code AvailableYes

Data AvailableYes

Open Source StatusYes

LicenseUnknown

Code URLs

https://prompt-compiler.github.io/SPML/

Data URLs

https://prompt-compiler.github.io/SPML/

Risks & Boundaries

Limitations

Developers must write system prompts in SPML; existing natural-language prompts need manual conversion.

Security analyzer uses GPT-3.5 in experiments; this introduces non-zero false positives and depends on the analyzer model.

When Not To Use

If you cannot modify existing system prompts or pipeline to insert SPML compilation.

When attackers exploit non-text channels (images/audio) not covered by text-only monitoring.

Failure Modes

High false positives from analyzer block legitimate user requests, hurting UX.

An adaptive attacker crafts inputs that fill the IR skeleton with plausible but malicious values.

Core Entities

Models

GPT-4GPT-3.5LLAMA-7BLLAMA-13B

Metrics

Error Rate (ER)

Datasets

SPML dataset (1871 SPs, ~20k user prompts)Tensor-TrustGandalf

Benchmarks

SPML prompt-injection benchmark

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

SPML yields lower attacker-miss error on jailbreak attacks than GPT-4 on the paper's benchmark.

SPML reduces unsafe-interaction error compared to GPT-4 on the paper's system-prompt dataset.

Results

What To Try In 7 Days

Optimization Features

Reproducibility

Code URLs

Data URLs

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Benchmarks

You May Also Want to Read

AdversaRiskQA: adversarial factuality benchmark for health, finance, and law

Key finding

Short, natural-looking token sequences can flip LLM judges to say 'Yes' on wrong answers; discovery and a small LoRA defense

Key finding

FACT-BENCH: a 20K-question benchmark that reveals when LLMs forget facts and how exemplars can make them lie

Key finding

RWKU: a stress test for forgetting real-world facts in LLMs using 200 real-person targets and adversarial probes

Key finding

Short adversarial suffixes can flip LLM-as-a-Judge decisions; CUA >30% success

Key finding