A small domain-specific language (SPML) that compiles strict chatbot specs and blocks prompt-injection attacks before they hit the LLM

February 19, 20247 min

Overview

Production Readiness

0.6

Novelty Score

0.7

Cost Impact Score

0.6

Citation Count

2

Authors

Reshabh K Sharma, Vinayak Gupta, Dan Grossman

Links

Abstract / PDF

Why It Matters For Business

SPML provides a lightweight, rule-like front door that blocks many prompt-injection attacks before they reach costly LLM calls, reducing risk and operating cost for deployed chatbots.

Summary TLDR

The paper introduces SPML, a domain-specific language for writing chatbot system prompts in a structured way, plus a compiler that emits a natural-language prompt and an intermediate form (SPML-IR) used to detect prompt-injection attacks before user text reaches the LLM. The authors release a dataset of 1,871 system prompts and ~20k labeled user inputs, and report that SPML’s IR-based monitoring reduces attacker-miss rates compared with baseline LLM checks (e.g., lower error on many malicious classes versus GPT-3.5/GPT-4 on evaluated benchmarks).

Problem Statement

Deployed chatbots rely on a fixed system prompt to constrain behavior but are vulnerable to prompt-injection attacks. Existing studies measure vulnerability but do not provide a practical, deployable way to write robust system prompts and block malicious user inputs before they reach the LLM.

Main Contribution

SPML: a domain-specific language for writing system prompts with types, single-assignment, and small PL features to reduce ambiguity.

SPML-IR: a deterministic intermediate form used to fill a prompt skeleton from user input and detect conflicting assignments as injection.

A dataset of 1,871 system prompts plus ≈20k labeled user prompts (safe, unsafe, malicious) for evaluating chatbot prompt defenses.

An evaluation showing SPML often reduces attacker-miss/error rates compared to baseline LLM-based detectors (GPT-3.5, GPT-4, LLaMA variants).

Key Findings

SPML yields lower attacker-miss error on jailbreak attacks than GPT-4 on the paper's benchmark.

NumbersJailbreak ER: SPML 1.29% vs GPT-4 4.31% (Table 2)

SPML reduces unsafe-interaction error compared to GPT-4 on the paper's system-prompt dataset.

NumbersUnsafe interactions ER: SPML 10.09% vs GPT-4 27.57% (Table 1)

The authors provide a sizable, multi-source benchmark for prompt-injection research.

NumbersDataset: 1,871 system prompts and ~20k user inputs (Section 6)

Results

Error Rate (safe interactions)

ValueSPML 9.95% | GPT-4 3.12% | GPT-3.5 6.07% | LLaMA-13B 24.83%

BaselineGPT-4 (best baseline for safe interactions)

Error Rate (unsafe interactions)

ValueSPML 10.09% | GPT-3.5 11.68% | GPT-4 27.57%

BaselineGPT-3.5

Error Rate (malicious: Jailbreak)

ValueSPML 1.29% | GPT-4 4.31% | GPT-3.5 28.32%

BaselineGPT-4

Who Should Care

What To Try In 7 Days

Run SPML compiler on one critical chatbot’s system prompt and test with the provided SPML dataset to find immediate vulnerabilities.

Add SPML-IR based pre-filtering in front of an LLM API to reject high-risk inputs and measure reduction in attacker slip-through and API calls.

Convert a handful of system-prompt rules (tone, name, scope) to SPML to see how many ambiguous/misleading NL instructions get resolved.

Optimization Features

Token Efficiency

  • SPML rejects some malicious inputs before LLM call, saving API tokens

System Optimization

  • Offline type-checking minimizes runtime cost

Reproducibility

Code Available

Data Available

Open Source Status

  • yes

Risks & Boundaries

Limitations

  • Developers must write system prompts in SPML; existing natural-language prompts need manual conversion.
  • Security analyzer uses GPT-3.5 in experiments; this introduces non-zero false positives and depends on the analyzer model.
  • SPML is not foolproof: complex or novel injection techniques may still bypass checks.

When Not To Use

  • If you cannot modify existing system prompts or pipeline to insert SPML compilation.
  • When attackers exploit non-text channels (images/audio) not covered by text-only monitoring.
  • If you require zero false positives for safety-critical flows without human review.

Failure Modes

  • High false positives from analyzer block legitimate user requests, hurting UX.
  • An adaptive attacker crafts inputs that fill the IR skeleton with plausible but malicious values.
  • Dependence on the analyzer LLM: model drift or version changes can change detection behavior.

Core Entities

Models

  • GPT-4
  • GPT-3.5
  • LLAMA-7B
  • LLAMA-13B

Metrics

  • Error Rate (ER)

Datasets

  • SPML dataset (1871 SPs, ~20k user prompts)
  • Tensor-Trust
  • Gandalf

Benchmarks

  • SPML prompt-injection benchmark