A provenance-aware, multi-agent pipeline that sanitizes text and images and validates LLM outputs to stop prompt-injection across LangChain/

December 29, 20258 min

Overview

Production Readiness

0.6

Novelty Score

0.6

Cost Impact Score

0.5

Citation Count

0

Authors

Toqeer Ali Syed, Mishal Ateeq Almutairi, Mahmoud Abdel Moaty

Links

Abstract / PDF

Why It Matters For Business

Agentic systems that chain LLMs and tools can be hijacked by hidden instructions in text or images. Adding per-message sanitization, provenance tracking, and output validation reduces attack surface without harming legitimate task accuracy—important for customer-facing automation, finance, and security-sensitive tools.

Summary TLDR

The paper proposes a practical defense that wraps multi-agent pipelines (LangChain/GraphChain) with per-message text and image sanitizers, a provenance ledger, pre-LLM trust masks, and an output validator. In a prototype, detection of multimodal prompt injections rose to 94%, cross-agent trust leakage fell 70%, and benign-task accuracy stayed high (96%). The design is modular and intended to be added to existing agent stacks with moderate runtime cost.

Problem Statement

Agent-based AI pipelines that route text and images through multiple LLM/VLM calls are vulnerable to hidden or paraphrased instructions (prompt injection). Existing defenses (keyword filters, fine-tuning, output filters) miss visual or cross-agent attacks and do not track provenance across agent hops.

Main Contribution

Cross-Agent Multimodal Provenance-Aware Defense: a layered pipeline that sanitizes every incoming text/image, reapplies sanitization before LLM calls, and validates outputs before passing them on.

Four cooperating agents: Text Sanitizer (span-level detection and rewrite), Visual Sanitizer (OCR, metadata, CLIP patch checks), Main Task Model (ML/VLM inference with trust masks), and Output Validator (policy checks and influence attribution).

A provenance ledger that records modality, source, span/patch index, and trust scores per interaction; ledger entries are used to build trust-aware attention masks.

Prototype implementation integrated with LangChain/GraphChain using RoBERTa for text detection, PaddleOCR + CLIP for images, Redis for ledger storage, and compatibility with GPT-4o-mini or open VLMs.

Empirical evaluation comparing the framework to keyword filtering, safety fine-tuning, post-hoc output filtering, and a single-VLM baseline on detection, trust leakage, and benign-task retention.

Key Findings

Multimodal prompt-injection detection rate improved to 94%.

Numbers94% detection (paper, Section V.A)

Cross-agent trust leakage was reduced from 0.24 to 0.07 (about 70% reduction).

Numbers0.24 → 0.07 (70% reduction) (Section V.B)

Benign task performance was preserved at 96% accuracy.

Numbers96% task accuracy retention (Section V.C)

Results

Accuracy

Value94%

Baselinekeyword filtering 52%; post-hoc 61%; safety fine-tune 66%

Cross-modal trust leakage

Value0.07

Baseline0.24

Accuracy

Value96%

Baselinesingle-VLM 94%

Who Should Care

What To Try In 7 Days

Add an input interceptor that routes all external text/images through a sanitizer before agents see them.

Record lightweight provenance metadata (source, modality, trust tag) alongside messages in a simple key-value store.

Wrap any LLM call with a pre-LLM filter that masks low-trust spans and an output validator that checks for policy violations before tool execution.

Agent Features

Memory

  • provenance ledger stores token/patch-level trust and provenance ids
  • memory modules store only sanitized content with trust metadata

Planning

  • trust-aware routing and masking before LLM inference

Tool Use

  • block or permit tool calls based on validator output and provenance
  • validator can request regeneration with tighter masks

Frameworks

  • LangChain
  • GraphChain

Is Agentic

true

Architectures

  • multi-agent pipeline (LangChain/GraphChain integration)
  • modular agent services (Text Sanitizer, Visual Sanitizer, Validator, Main Model)

Collaboration

  • provenance propagation across agent hops
  • cross-agent enforcement of trust boundaries

Optimization Features

Token Efficiency

  • selective masking removes or attenuates only low-trust spans

Infra Optimization

  • in-memory Redis ledger for low-latency provenance lookups

System Optimization

  • modular Python services for agents to allow incremental integration

Inference Optimization

  • trust-aware attention masking to reduce influence of low-trust tokens

Reproducibility

Open Source Status

  • partial

Risks & Boundaries

Limitations

  • No public dataset or full evaluation details are shared; exact test scenarios and dataset construction are unspecified.
  • Prototype relies on heuristic detectors (pattern rules, RoBERTa detector, stego detector) which may miss novel attacks.
  • Provenance ledger itself could be a target (poisoning or tampering) and requires secure deployment.
  • Performance overhead described as moderate but no latency numbers are reported.

When Not To Use

  • Ultra-low-latency deployments where any extra SAN/validation roundtrip is unacceptable.
  • Systems where all inputs are fully trusted and controlled (internal-only closed pipelines).

Failure Modes

  • Adversary crafts new visual steganography or paraphrases that bypass the sanitizer detectors.
  • Provenance ledger poisoning or incorrect trust assignment leading to false approval.
  • False positives redact needed content and degrade downstream task utility.
  • Over-reliance on wrapper rules without model-level hardening causes blind spots.

Core Entities

Models

  • RoBERTa (pattern detector)
  • CLIP (image patch embeddings)
  • PaddleOCR (OCR)
  • GPT-4o-mini (OpenAI) as exemplar LLM
  • LLaVA / BLIP-2 (open VLM options)

Metrics

  • Accuracy
  • trust leakage (unit reported as ×0.01)

Benchmarks

  • keyword filtering baseline
  • safety fine-tuning baseline
  • post-hoc output filtering baseline
  • single-VLM baseline

Context Entities

Datasets

  • MM-SafetyBench (cited but not explicitly used)