PlotEdit: five LLM agents edit chart images from plain English, improving fidelity and accessibility

January 20, 20256 min

Overview

Decision SnapshotReady For Pilot

The method is a practical pipeline that improves fidelity on ChartCraft, but it depends on gated multimodal LLMs and a Python replotter; production steps require integration and safety checks.

Citations1

Evidence Strength0.75

Confidence0.78

Risk Signals9

Trust Signals

Findings with numeric evidence: 3/3

Findings with evidence refs: 3/3

Results with explicit delta: 4/4

Reproducibility

Status: Partial assets available

Open source: Unknown

At A Glance

Cost impact: 45%

Production readiness: 70%

Novelty: 60%

Authors

Kanika Goswami, Puneet Mathur, Ryan Rossi, Franck Dernoncourt

Links

Abstract / PDF / Data

Why It Matters For Business

PlotEdit turns static chart images in PDFs into editable, high-fidelity charts using natural language, speeding up content updates and improving accessibility for visually impaired users.

Who Should Care

Summary TLDR

PlotEdit is a multi-agent system that edits chart images (PDFs/scans) from plain-language instructions. It uses five LLM agents to extract data, style, and code, decomposes user edits into steps, and applies multimodal feedback (numeric, visual, code) to iteratively fix errors. On the ChartCraft test set it improves structural and style fidelity versus prior methods (e.g., overall SSIM 89.0 vs 82.4 for a strong baseline). The system is most useful when you need faithful, editable reconstructions from chart images and when Python replotting is acceptable.

Problem Statement

Charts in PDFs and scans are often images with no source data or style metadata. That makes edits — changing data, layout, or style — hard or manual. Existing single-shot vision-language models and plain LLM prompting struggle because they hallucinate or fail to recover accurate tables, styles, or executable code.

Main Contribution

A five-agent pipeline (Chart2Table, Chart2Vision, Chart2Code, Instruction Decomposition, Multimodal Editing) that de-renders and edits chart images from natural language.

Three linked feedback modes — code checks, visual comparison, numeric checks — used iteratively for self-reflection and error correction.

Key Findings

PlotEdit produces more faithful edited charts than prior methods on ChartCraft.

NumbersOverall SSIM 89.0 vs ChartReformer 82.4 (Table 1)

Practical UseUse PlotEdit when you need higher structural fidelity when editing chart images from PDFs or scans.

Evidence RefTable 1

Multimodal feedback improves edit quality.

NumbersOverall SSIM drops 89.087.2 without feedback agents (PlotEdit w/o MFA) (Table 1)

Practical UseInclude visual, numeric, and code feedback loops to reduce de-rendering errors and get more accurate edits.

Evidence RefTable 1

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Overall SSIM89.0ChartReformer 82.4+6.6ChartCraft (overall)Table 1 shows overall SSIM for PlotEdit 89.0 vs ChartReformer 82.4Table 1
Layout SSIM91.3ChartReformer 82.6+8.7ChartCraft (layout edits)Table 1 layout SSIM: PlotEdit 91.3 vs ChartReformer 82.6Table 1

What To Try In 7 Days

Run PlotEdit on 20 representative PDF charts to measure replot fidelity versus your current process.

Prototype a pipeline: use Chart2Table + Chart2Code to extract tables and generate Python plots, then apply visual feedback checks.

Test accessibility use cases: adjust color/contrast and re-evaluate readability for low-vision settings.

Agent Features

Memory
short-term iterative feedback state
Planning
instruction decompositionself-reflection loops
Tool Use
code execution and dynamic checksAST static analysisimage similarity (MS-SSIM)pandas for data edits
Frameworks
chain-of-thought promptingfew-shot in-context learning
Is Agentic

Yes

Architectures
multi-agent LLM orchestration
Collaboration
sequential agent orchestration

Reproducibility

Code AvailableNo
Data AvailableYes
Open Source StatusUnknown
LicenseUnknown

Data URLs

ChartCraft dataset (used for evaluation; see Table 1)

Risks & Boundaries

Limitations

Relies on large multimodal LLMs (GPT-4V/GPT-4o); access and cost can be barriers.

Evaluation is on ChartCraft; real-world charts may be more diverse.

When Not To Use

If you cannot run or pay for multimodal LLMs.

When charts are created with non-Python or proprietary rendering pipelines.

Failure Modes

Poor de-rendering when input images are extremely low-resolution or heavily occluded.

Hallucinated or incorrect data tables leading to wrong edits.

Core Entities

Models

GPT-4VGPT-4oChartReformerChartLLaMAIn-context Learning (LLM prompts)PlotEdit (this work)

Metrics

SSIMMS-SSIMRMS (Relative Mapping Similarity)VAES (Visual Attribute Edit Score)

Datasets

ChartCraft

Benchmarks

ChartCraft evaluation (style/layout/format/data edits)