A provenance-aware, multi-agent pipeline that sanitizes text and images and validates LLM outputs to stop prompt-injection across LangChain/

Overview

Decision SnapshotNeeds Validation

Prototype shows strong detection and leakage gains in the authors' testbed. The approach is practical and modular but evaluation details and public datasets are not provided, so production integration will need engineering and further validation.

Citations0

Evidence Strength0.70

Confidence0.75

Risk Signals10

Trust Signals

Findings with numeric evidence: 3/3

Findings with evidence refs: 3/3

Results with explicit delta: 3/3

Reproducibility

Status: No open assets linked

Open source: Partial

At A Glance

Cost impact: 50%

Production readiness: 60%

Novelty: 60%

Authors

Toqeer Ali Syed, Mishal Ateeq Almutairi, Mahmoud Abdel Moaty

Links

Abstract / PDF

Why It Matters For Business

Agentic systems that chain LLMs and tools can be hijacked by hidden instructions in text or images. Adding per-message sanitization, provenance tracking, and output validation reduces attack surface without harming legitimate task accuracy—important for customer-facing automation, finance, and security-sensitive tools.

Who Should Care

CTO Product Manager ML Engineer Engineering Lead Data Scientist

Summary TLDR

The paper proposes a practical defense that wraps multi-agent pipelines (LangChain/GraphChain) with per-message text and image sanitizers, a provenance ledger, pre-LLM trust masks, and an output validator. In a prototype, detection of multimodal prompt injections rose to 94%, cross-agent trust leakage fell 70%, and benign-task accuracy stayed high (96%). The design is modular and intended to be added to existing agent stacks with moderate runtime cost.

Problem Statement

Agent-based AI pipelines that route text and images through multiple LLM/VLM calls are vulnerable to hidden or paraphrased instructions (prompt injection). Existing defenses (keyword filters, fine-tuning, output filters) miss visual or cross-agent attacks and do not track provenance across agent hops.

Main Contribution

Cross-Agent Multimodal Provenance-Aware Defense: a layered pipeline that sanitizes every incoming text/image, reapplies sanitization before LLM calls, and validates outputs before passing them on.

Four cooperating agents: Text Sanitizer (span-level detection and rewrite), Visual Sanitizer (OCR, metadata, CLIP patch checks), Main Task Model (ML/VLM inference with trust masks), and Output Validator (policy checks and influence attribution).

Key Findings

Multimodal prompt-injection detection rate improved to 94%.

Numbers94% detection (paper, Section V.A)

Practical UseDeploy layered text+visual sanitizers and pre-LLM trust masks to detect most injection attempts in agentic pipelines.

Evidence RefSection V.A, Figure 3

Cross-agent trust leakage was reduced from 0.24 to 0.07 (about 70% reduction).

Numbers0.24 → 0.07 (70% reduction) (Section V.B)

Practical UseUse provenance-ledger propagation and trust-aware attention masking to prevent low-trust content from influencing final outputs.

Evidence RefSection V.B, Figure 3

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
Accuracy	94%	keyword filtering 52%; post-hoc 61%; safety fine-tune 66%	↑28–42 pp vs baselines	prototype testbed (multi-modal injection scenarios)	Section V.A, Figure 3	Section V.A
Cross-modal trust leakage	0.07	0.24	70% reduction	prototype testbed influence measure	Section V.B, Figure 3	Section V.B

What To Try In 7 Days

Add an input interceptor that routes all external text/images through a sanitizer before agents see them.

Record lightweight provenance metadata (source, modality, trust tag) alongside messages in a simple key-value store.

Wrap any LLM call with a pre-LLM filter that masks low-trust spans and an output validator that checks for policy violations before tool execution.

Agent Features

Memory

provenance ledger stores token/patch-level trust and provenance idsmemory modules store only sanitized content with trust metadata

Planning

trust-aware routing and masking before LLM inference

Tool Use

block or permit tool calls based on validator output and provenancevalidator can request regeneration with tighter masks

Frameworks

LangChainGraphChain

Is Agentic

Yes

Architectures

multi-agent pipeline (LangChain/GraphChain integration)modular agent services (Text Sanitizer, Visual Sanitizer, Validator, Main Model)

Collaboration

provenance propagation across agent hopscross-agent enforcement of trust boundaries

Optimization Features

Token Efficiency

selective masking removes or attenuates only low-trust spans

Infra Optimization

in-memory Redis ledger for low-latency provenance lookups

System Optimization

modular Python services for agents to allow incremental integration

Inference Optimization

trust-aware attention masking to reduce influence of low-trust tokens

Reproducibility

Code AvailableNo

Data AvailableNo

Open Source StatusPartial

LicenseUnknown

Risks & Boundaries

Limitations

No public dataset or full evaluation details are shared; exact test scenarios and dataset construction are unspecified.

Prototype relies on heuristic detectors (pattern rules, RoBERTa detector, stego detector) which may miss novel attacks.

When Not To Use

Ultra-low-latency deployments where any extra SAN/validation roundtrip is unacceptable.

Systems where all inputs are fully trusted and controlled (internal-only closed pipelines).

Failure Modes

Adversary crafts new visual steganography or paraphrases that bypass the sanitizer detectors.

Provenance ledger poisoning or incorrect trust assignment leading to false approval.

Core Entities

Models

RoBERTa (pattern detector)CLIP (image patch embeddings)PaddleOCR (OCR)GPT-4o-mini (OpenAI) as exemplar LLMLLaVA / BLIP-2 (open VLM options)

Metrics

Accuracytrust leakage (unit reported as ×0.01)

Benchmarks

keyword filtering baselinesafety fine-tuning baselinepost-hoc output filtering baselinesingle-VLM baseline

Context Entities

Datasets

MM-SafetyBench (cited but not explicitly used)

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

Multimodal prompt-injection detection rate improved to 94%.

Cross-agent trust leakage was reduced from 0.24 to 0.07 (about 70% reduction).

Results

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Benchmarks

Context Entities

Datasets

You May Also Want to Read

Short adversarial suffixes can flip LLM-as-a-Judge decisions; CUA >30% success

Key finding

BackdoorAgent: a stage-aware framework and benchmark showing memory backdoors persist across multi-step LLM agents

Key finding

JudgeDeceiver: automatically craft prompts that reliably trick LLM-as-a-Judge to pick an attacker’s response

Key finding

Make tool-using LLM agents provably safe by combining safety engineering, info-flow labels, and MCP extensions

Key finding

A systematic, practitioner-focused map of 193 multi-agent security threats and how 16 frameworks cover them

Key finding