Overview
Production Readiness
0.3
Novelty Score
0.6
Cost Impact Score
0.5
Citation Count
0
Why It Matters For Business
Deterministic guardrails reduce unacceptable risks in enterprise agents (data leaks, unauthorized writes) and let teams choose autonomy levels with verifiable safety.
Summary TLDR
LLM agents can compose tool calls in ways that leak data or perform unsafe actions. The paper proposes a practical process: use STPA (a systems safety method) to find hazards, convert requirements into formal, enforceable specifications for data flows and tool sequences, and extend the Model Context Protocol (MCP) so tools and data carry structured labels (confidentiality, capabilities, trust). A four-tier enforcement model (blocklist/allowlist/mustlist/confirmation) is applied at tool boundaries. An Alloy model shows these label-based policies can deterministically block unsafe traces while keeping safe behaviors. The work is a concrete blueprint and a proof-of-concept; an engine to enforce
Problem Statement
LLM agents that call external APIs and tools can accidentally or adversarially leak sensitive data or perform harmful actions. Existing model-based guards improve detection but not guarantees. We need a practical way to (1) identify interaction hazards and (2) enforce deterministic constraints on tool ordering and data flows at run time, without excessive manual labeling or constant user prompts.
Main Contribution
Adapt STPA safety analysis to identify interaction hazards and derive agent safety requirements.
Turn safety requirements into symbolic specifications enforceable at tool boundaries (data-flow and temporal constraints).
Propose capability-enhanced MCP: require structured key-value labels (confidentiality, capabilities, trust) on data and tools.
Define a four-tier enforcement model: blocklist, allowlist, mustlist, confirmation, applied externally to the agent.
Demonstrate feasibility with an Alloy formal model that exhaustively checks bounded traces and shows policies can block unsafe flows while preserving safe traces.
Key Findings
A bounded Alloy model can prove that label-based policies eliminate unsafe flows that otherwise occur.
The existing MCP provides only minimal, optional annotations, which are insufficient for deterministic enforcement.
Three runtime labels suffice as a starting enforcement surface: data confidentiality, tool capabilities, and trust level.
Results
safety violation elimination
Who Should Care
What To Try In 7 Days
Run a quick STPA-style hazard brainstorm on one task-specific agent (e.g., calendar or CRM flow).
Add three minimal MCP tags (confidentiality, capabilities, trust) to your tool registry for that agent.
Prototype an interceptor that blocks send_email when data is labeled private and target is external.
Agent Features
Planning
- LLM plans tool calls each loop
Tool Use
- intercepted tool calls at runtime
- tool sequencing constraints (temporal rules)
- data-label based blocking for external writes
Frameworks
- Model Context Protocol (MCP)
Is Agentic
true
Architectures
- LLM-based agent
Reproducibility
License
- Creative Commons Attribution 4.0 (paper)
Open Source Status
- unknown
Risks & Boundaries
Limitations
- Requires reliable labels on data and tools; labeling can be costly or spoofable.
- MCP metadata may be untrusted in open markets; must limit to certified tools or add verification.
- Alloy analysis is bounded; real systems need scalable enforcement engines and runtime checks.
- Tradeoff: stronger guarantees reduce agent autonomy and may lower utility.
When Not To Use
- For general-purpose agents with unknown tools and broad scope (hard to specify safety rules).
- In low-latency, high-throughput contexts where interception adds unacceptable delay.
- When tool metadata cannot be trusted or certified.
Failure Modes
- Incorrect or forged labels let unsafe flows bypass checks.
- Overly strict policies block needed functionality and hurt adoption.
- Attackers manipulate trusted tools or metadata to escalate privileges.
- Confirmation fatigue if confirmations are overused.

