PlotEdit: five LLM agents edit chart images from plain English, improving fidelity and accessibility

Overview

Decision SnapshotReady For Pilot

The method is a practical pipeline that improves fidelity on ChartCraft, but it depends on gated multimodal LLMs and a Python replotter; production steps require integration and safety checks.

Citations1

Evidence Strength0.75

Confidence0.78

Risk Signals9

Trust Signals

Findings with numeric evidence: 3/3

Findings with evidence refs: 3/3

Results with explicit delta: 4/4

Reproducibility

Status: Partial assets available

Open source: Unknown

At A Glance

Cost impact: 45%

Production readiness: 70%

Novelty: 60%

Authors

Kanika Goswami, Puneet Mathur, Ryan Rossi, Franck Dernoncourt

Links

Abstract / PDF / Data

Why It Matters For Business

PlotEdit turns static chart images in PDFs into editable, high-fidelity charts using natural language, speeding up content updates and improving accessibility for visually impaired users.

Who Should Care

Product Manager ML Engineer Data Scientist Engineering Lead Founder

Summary TLDR

PlotEdit is a multi-agent system that edits chart images (PDFs/scans) from plain-language instructions. It uses five LLM agents to extract data, style, and code, decomposes user edits into steps, and applies multimodal feedback (numeric, visual, code) to iteratively fix errors. On the ChartCraft test set it improves structural and style fidelity versus prior methods (e.g., overall SSIM 89.0 vs 82.4 for a strong baseline). The system is most useful when you need faithful, editable reconstructions from chart images and when Python replotting is acceptable.

Problem Statement

Charts in PDFs and scans are often images with no source data or style metadata. That makes edits — changing data, layout, or style — hard or manual. Existing single-shot vision-language models and plain LLM prompting struggle because they hallucinate or fail to recover accurate tables, styles, or executable code.

Main Contribution

A five-agent pipeline (Chart2Table, Chart2Vision, Chart2Code, Instruction Decomposition, Multimodal Editing) that de-renders and edits chart images from natural language.

Three linked feedback modes — code checks, visual comparison, numeric checks — used iteratively for self-reflection and error correction.

Key Findings

PlotEdit produces more faithful edited charts than prior methods on ChartCraft.

NumbersOverall SSIM 89.0 vs ChartReformer 82.4 (Table 1)

Practical UseUse PlotEdit when you need higher structural fidelity when editing chart images from PDFs or scans.

Evidence RefTable 1

Multimodal feedback improves edit quality.

NumbersOverall SSIM drops 89.0 → 87.2 without feedback agents (PlotEdit w/o MFA) (Table 1)

Practical UseInclude visual, numeric, and code feedback loops to reduce de-rendering errors and get more accurate edits.

Evidence RefTable 1

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
Overall SSIM	89.0	ChartReformer 82.4	+6.6	ChartCraft (overall)	Table 1 shows overall SSIM for PlotEdit 89.0 vs ChartReformer 82.4	Table 1
Layout SSIM	91.3	ChartReformer 82.6	+8.7	ChartCraft (layout edits)	Table 1 layout SSIM: PlotEdit 91.3 vs ChartReformer 82.6	Table 1

What To Try In 7 Days

Run PlotEdit on 20 representative PDF charts to measure replot fidelity versus your current process.

Prototype a pipeline: use Chart2Table + Chart2Code to extract tables and generate Python plots, then apply visual feedback checks.

Test accessibility use cases: adjust color/contrast and re-evaluate readability for low-vision settings.

Agent Features

Memory

short-term iterative feedback state

Planning

instruction decompositionself-reflection loops

Tool Use

code execution and dynamic checksAST static analysisimage similarity (MS-SSIM)pandas for data edits

Frameworks

chain-of-thought promptingfew-shot in-context learning

Is Agentic

Yes

Architectures

multi-agent LLM orchestration

Collaboration

sequential agent orchestration

Reproducibility

Code AvailableNo

Data AvailableYes

Open Source StatusUnknown

LicenseUnknown

Data URLs

ChartCraft dataset (used for evaluation; see Table 1)

Risks & Boundaries

Limitations

Relies on large multimodal LLMs (GPT-4V/GPT-4o); access and cost can be barriers.

Evaluation is on ChartCraft; real-world charts may be more diverse.

When Not To Use

If you cannot run or pay for multimodal LLMs.

When charts are created with non-Python or proprietary rendering pipelines.

Failure Modes

Poor de-rendering when input images are extremely low-resolution or heavily occluded.

Hallucinated or incorrect data tables leading to wrong edits.

Core Entities

Models

GPT-4VGPT-4oChartReformerChartLLaMAIn-context Learning (LLM prompts)PlotEdit (this work)

Metrics

SSIMMS-SSIMRMS (Relative Mapping Similarity)VAES (Visual Attribute Edit Score)

Datasets

ChartCraft

Benchmarks

ChartCraft evaluation (style/layout/format/data edits)

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

PlotEdit produces more faithful edited charts than prior methods on ChartCraft.

Multimodal feedback improves edit quality.

Results

What To Try In 7 Days

Agent Features

Reproducibility

Data URLs

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Benchmarks

You May Also Want to Read

Chemistry foundation models power structure-focused multimodal RAG inside hierarchical multi-agent workflows

Key finding

Argues that 'agentic' buzzwords mostly rebrand decades-old agent and multi-agent research

Key finding

TRiSM: practical trust, risk and security controls for LLM-based multi-agent systems

Key finding

A dynamic town simulation that tests LLM agents on doing tasks while following local cultural norms

Key finding

A process-aware, auditable multi-agent evaluator that produces more stable, human-aligned scores than a single LLM judge

Key finding