Overview
Prototype shows strong detection and leakage gains in the authors' testbed. The approach is practical and modular but evaluation details and public datasets are not provided, so production integration will need engineering and further validation.
Citations0
Evidence Strength0.70
Confidence0.75
Risk Signals10
Trust Signals
Findings with numeric evidence: 3/3
Findings with evidence refs: 3/3
Results with explicit delta: 3/3
Reproducibility
Status: No open assets linked
Open source: Partial
At A Glance
Cost impact: 50%
Production readiness: 60%
Novelty: 60%
Why It Matters For Business
Agentic systems that chain LLMs and tools can be hijacked by hidden instructions in text or images. Adding per-message sanitization, provenance tracking, and output validation reduces attack surface without harming legitimate task accuracy—important for customer-facing automation, finance, and security-sensitive tools.
Who Should Care
Summary TLDR
The paper proposes a practical defense that wraps multi-agent pipelines (LangChain/GraphChain) with per-message text and image sanitizers, a provenance ledger, pre-LLM trust masks, and an output validator. In a prototype, detection of multimodal prompt injections rose to 94%, cross-agent trust leakage fell 70%, and benign-task accuracy stayed high (96%). The design is modular and intended to be added to existing agent stacks with moderate runtime cost.
Problem Statement
Agent-based AI pipelines that route text and images through multiple LLM/VLM calls are vulnerable to hidden or paraphrased instructions (prompt injection). Existing defenses (keyword filters, fine-tuning, output filters) miss visual or cross-agent attacks and do not track provenance across agent hops.
Main Contribution
Cross-Agent Multimodal Provenance-Aware Defense: a layered pipeline that sanitizes every incoming text/image, reapplies sanitization before LLM calls, and validates outputs before passing them on.
Four cooperating agents: Text Sanitizer (span-level detection and rewrite), Visual Sanitizer (OCR, metadata, CLIP patch checks), Main Task Model (ML/VLM inference with trust masks), and Output Validator (policy checks and influence attribution).
Key Findings
Multimodal prompt-injection detection rate improved to 94%.
Cross-agent trust leakage was reduced from 0.24 to 0.07 (about 70% reduction).
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Accuracy | 94% | keyword filtering 52%; post-hoc 61%; safety fine-tune 66% | ↑28–42 pp vs baselines | prototype testbed (multi-modal injection scenarios) | Section V.A, Figure 3 | Section V.A |
| Cross-modal trust leakage | 0.07 | 0.24 | 70% reduction | prototype testbed influence measure | Section V.B, Figure 3 | Section V.B |
What To Try In 7 Days
Add an input interceptor that routes all external text/images through a sanitizer before agents see them.
Record lightweight provenance metadata (source, modality, trust tag) alongside messages in a simple key-value store.
Wrap any LLM call with a pre-LLM filter that masks low-trust spans and an output validator that checks for policy violations before tool execution.
Agent Features
Memory
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Collaboration
Optimization Features
Token Efficiency
Infra Optimization
System Optimization
Inference Optimization
Reproducibility
Risks & Boundaries
Limitations
No public dataset or full evaluation details are shared; exact test scenarios and dataset construction are unspecified.
Prototype relies on heuristic detectors (pattern rules, RoBERTa detector, stego detector) which may miss novel attacks.
When Not To Use
Ultra-low-latency deployments where any extra SAN/validation roundtrip is unacceptable.
Systems where all inputs are fully trusted and controlled (internal-only closed pipelines).
Failure Modes
Adversary crafts new visual steganography or paraphrases that bypass the sanitizer detectors.
Provenance ledger poisoning or incorrect trust assignment leading to false approval.

