A small domain-specific language (SPML) that compiles strict chatbot specs and blocks prompt-injection attacks before they hit the LLM

February 19, 20247 min

Overview

Decision SnapshotNeeds Validation

SPML is a practical, deployable pre-filter with a released dataset and code; results are strong on the paper's benchmarks but rely on language-model-based type checking and a specific analyzer setup.

Citations2

Evidence Strength0.60

Confidence0.85

Risk Signals9

Trust Signals

Findings with numeric evidence: 3/3

Findings with evidence refs: 3/3

Results with explicit delta: 3/3

Reproducibility

Status: Code + data available

Open source: Yes

At A Glance

Cost impact: 60%

Production readiness: 60%

Novelty: 70%

Authors

Reshabh K Sharma, Vinayak Gupta, Dan Grossman

Links

Abstract / PDF / Code / Data

Why It Matters For Business

SPML provides a lightweight, rule-like front door that blocks many prompt-injection attacks before they reach costly LLM calls, reducing risk and operating cost for deployed chatbots.

Who Should Care

Summary TLDR

The paper introduces SPML, a domain-specific language for writing chatbot system prompts in a structured way, plus a compiler that emits a natural-language prompt and an intermediate form (SPML-IR) used to detect prompt-injection attacks before user text reaches the LLM. The authors release a dataset of 1,871 system prompts and ~20k labeled user inputs, and report that SPML’s IR-based monitoring reduces attacker-miss rates compared with baseline LLM checks (e.g., lower error on many malicious classes versus GPT-3.5/GPT-4 on evaluated benchmarks).

Problem Statement

Deployed chatbots rely on a fixed system prompt to constrain behavior but are vulnerable to prompt-injection attacks. Existing studies measure vulnerability but do not provide a practical, deployable way to write robust system prompts and block malicious user inputs before they reach the LLM.

Main Contribution

SPML: a domain-specific language for writing system prompts with types, single-assignment, and small PL features to reduce ambiguity.

SPML-IR: a deterministic intermediate form used to fill a prompt skeleton from user input and detect conflicting assignments as injection.

Key Findings

SPML yields lower attacker-miss error on jailbreak attacks than GPT-4 on the paper's benchmark.

NumbersJailbreak ER: SPML 1.29% vs GPT-4 4.31% (Table 2)

Practical UseUsing SPML's IR-based check can cut missed jailbreaks by ~3x on evaluated examples; consider adding SPML-style pre-filters before an LLM in production.

Evidence RefTable 2

SPML reduces unsafe-interaction error compared to GPT-4 on the paper's system-prompt dataset.

NumbersUnsafe interactions ER: SPML 10.09% vs GPT-4 27.57% (Table 1)

Practical UseFor broad prompt-change attacks, SPML flags far fewer attacks that slip past GPT-4 on the tested dataset — useful when protecting many vertical bots.

Evidence RefTable 1

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Error Rate (safe interactions)SPML 9.95% | GPT-4 3.12% | GPT-3.5 6.07% | LLaMA-13B 24.83%GPT-4 (best baseline for safe interactions)SPML higher false positive than GPT-4 by 6.83ptsPaper's system-prompt dataset (Section 8, Table 1)Table 1 reports safe-interaction ERsTable 1
Error Rate (unsafe interactions)SPML 10.09% | GPT-3.5 11.68% | GPT-4 27.57%GPT-3.5SPML ~1.6 pts lower than GPT-3.5; ~17.5 pts lower than GPT-4Paper's system-prompt dataset (Section 8, Table 1)Table 1 shows unsafe-interaction ERsTable 1

What To Try In 7 Days

Run SPML compiler on one critical chatbot’s system prompt and test with the provided SPML dataset to find immediate vulnerabilities.

Add SPML-IR based pre-filtering in front of an LLM API to reject high-risk inputs and measure reduction in attacker slip-through and API calls.

Convert a handful of system-prompt rules (tone, name, scope) to SPML to see how many ambiguous/misleading NL instructions get resolved.

Optimization Features

Token Efficiency
SPML rejects some malicious inputs before LLM call, saving API tokens
System Optimization
Offline type-checking minimizes runtime cost

Reproducibility

Code AvailableYes
Data AvailableYes
Open Source StatusYes
LicenseUnknown

Risks & Boundaries

Limitations

Developers must write system prompts in SPML; existing natural-language prompts need manual conversion.

Security analyzer uses GPT-3.5 in experiments; this introduces non-zero false positives and depends on the analyzer model.

When Not To Use

If you cannot modify existing system prompts or pipeline to insert SPML compilation.

When attackers exploit non-text channels (images/audio) not covered by text-only monitoring.

Failure Modes

High false positives from analyzer block legitimate user requests, hurting UX.

An adaptive attacker crafts inputs that fill the IR skeleton with plausible but malicious values.

Core Entities

Models

GPT-4GPT-3.5LLAMA-7BLLAMA-13B

Metrics

Error Rate (ER)

Datasets

SPML dataset (1871 SPs, ~20k user prompts)Tensor-TrustGandalf

Benchmarks

SPML prompt-injection benchmark