A provenance-aware, multi-agent pipeline that sanitizes text and images and validates LLM outputs to stop prompt-injection across LangChain/

December 29, 20258 min

Overview

Decision SnapshotNeeds Validation

Prototype shows strong detection and leakage gains in the authors' testbed. The approach is practical and modular but evaluation details and public datasets are not provided, so production integration will need engineering and further validation.

Citations0

Evidence Strength0.70

Confidence0.75

Risk Signals10

Trust Signals

Findings with numeric evidence: 3/3

Findings with evidence refs: 3/3

Results with explicit delta: 3/3

Reproducibility

Status: No open assets linked

Open source: Partial

At A Glance

Cost impact: 50%

Production readiness: 60%

Novelty: 60%

Authors

Toqeer Ali Syed, Mishal Ateeq Almutairi, Mahmoud Abdel Moaty

Links

Abstract / PDF

Why It Matters For Business

Agentic systems that chain LLMs and tools can be hijacked by hidden instructions in text or images. Adding per-message sanitization, provenance tracking, and output validation reduces attack surface without harming legitimate task accuracy—important for customer-facing automation, finance, and security-sensitive tools.

Who Should Care

Summary TLDR

The paper proposes a practical defense that wraps multi-agent pipelines (LangChain/GraphChain) with per-message text and image sanitizers, a provenance ledger, pre-LLM trust masks, and an output validator. In a prototype, detection of multimodal prompt injections rose to 94%, cross-agent trust leakage fell 70%, and benign-task accuracy stayed high (96%). The design is modular and intended to be added to existing agent stacks with moderate runtime cost.

Problem Statement

Agent-based AI pipelines that route text and images through multiple LLM/VLM calls are vulnerable to hidden or paraphrased instructions (prompt injection). Existing defenses (keyword filters, fine-tuning, output filters) miss visual or cross-agent attacks and do not track provenance across agent hops.

Main Contribution

Cross-Agent Multimodal Provenance-Aware Defense: a layered pipeline that sanitizes every incoming text/image, reapplies sanitization before LLM calls, and validates outputs before passing them on.

Four cooperating agents: Text Sanitizer (span-level detection and rewrite), Visual Sanitizer (OCR, metadata, CLIP patch checks), Main Task Model (ML/VLM inference with trust masks), and Output Validator (policy checks and influence attribution).

Key Findings

Multimodal prompt-injection detection rate improved to 94%.

Numbers94% detection (paper, Section V.A)

Practical UseDeploy layered text+visual sanitizers and pre-LLM trust masks to detect most injection attempts in agentic pipelines.

Evidence RefSection V.A, Figure 3

Cross-agent trust leakage was reduced from 0.24 to 0.07 (about 70% reduction).

Numbers0.240.07 (70% reduction) (Section V.B)

Practical UseUse provenance-ledger propagation and trust-aware attention masking to prevent low-trust content from influencing final outputs.

Evidence RefSection V.B, Figure 3

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Accuracy94%keyword filtering 52%; post-hoc 61%; safety fine-tune 66%2842 pp vs baselinesprototype testbed (multi-modal injection scenarios)Section V.A, Figure 3Section V.A
Cross-modal trust leakage0.070.2470% reductionprototype testbed influence measureSection V.B, Figure 3Section V.B

What To Try In 7 Days

Add an input interceptor that routes all external text/images through a sanitizer before agents see them.

Record lightweight provenance metadata (source, modality, trust tag) alongside messages in a simple key-value store.

Wrap any LLM call with a pre-LLM filter that masks low-trust spans and an output validator that checks for policy violations before tool execution.

Agent Features

Memory
provenance ledger stores token/patch-level trust and provenance idsmemory modules store only sanitized content with trust metadata
Planning
trust-aware routing and masking before LLM inference
Tool Use
block or permit tool calls based on validator output and provenancevalidator can request regeneration with tighter masks
Frameworks
LangChainGraphChain
Is Agentic

Yes

Architectures
multi-agent pipeline (LangChain/GraphChain integration)modular agent services (Text Sanitizer, Visual Sanitizer, Validator, Main Model)
Collaboration
provenance propagation across agent hopscross-agent enforcement of trust boundaries

Optimization Features

Token Efficiency
selective masking removes or attenuates only low-trust spans
Infra Optimization
in-memory Redis ledger for low-latency provenance lookups
System Optimization
modular Python services for agents to allow incremental integration
Inference Optimization
trust-aware attention masking to reduce influence of low-trust tokens

Reproducibility

Code AvailableNo
Data AvailableNo
Open Source StatusPartial
LicenseUnknown

Risks & Boundaries

Limitations

No public dataset or full evaluation details are shared; exact test scenarios and dataset construction are unspecified.

Prototype relies on heuristic detectors (pattern rules, RoBERTa detector, stego detector) which may miss novel attacks.

When Not To Use

Ultra-low-latency deployments where any extra SAN/validation roundtrip is unacceptable.

Systems where all inputs are fully trusted and controlled (internal-only closed pipelines).

Failure Modes

Adversary crafts new visual steganography or paraphrases that bypass the sanitizer detectors.

Provenance ledger poisoning or incorrect trust assignment leading to false approval.

Core Entities

Models

RoBERTa (pattern detector)CLIP (image patch embeddings)PaddleOCR (OCR)GPT-4o-mini (OpenAI) as exemplar LLMLLaVA / BLIP-2 (open VLM options)

Metrics

Accuracytrust leakage (unit reported as ×0.01)

Benchmarks

keyword filtering baselinesafety fine-tuning baselinepost-hoc output filtering baselinesingle-VLM baseline

Context Entities

Datasets

MM-SafetyBench (cited but not explicitly used)