PlotEdit: five LLM agents edit chart images from plain English, improving fidelity and accessibility

January 20, 20256 min

Overview

Production Readiness

0.7

Novelty Score

0.6

Cost Impact Score

0.45

Citation Count

1

Authors

Kanika Goswami, Puneet Mathur, Ryan Rossi, Franck Dernoncourt

Links

Abstract / PDF

Why It Matters For Business

PlotEdit turns static chart images in PDFs into editable, high-fidelity charts using natural language, speeding up content updates and improving accessibility for visually impaired users.

Summary TLDR

PlotEdit is a multi-agent system that edits chart images (PDFs/scans) from plain-language instructions. It uses five LLM agents to extract data, style, and code, decomposes user edits into steps, and applies multimodal feedback (numeric, visual, code) to iteratively fix errors. On the ChartCraft test set it improves structural and style fidelity versus prior methods (e.g., overall SSIM 89.0 vs 82.4 for a strong baseline). The system is most useful when you need faithful, editable reconstructions from chart images and when Python replotting is acceptable.

Problem Statement

Charts in PDFs and scans are often images with no source data or style metadata. That makes edits — changing data, layout, or style — hard or manual. Existing single-shot vision-language models and plain LLM prompting struggle because they hallucinate or fail to recover accurate tables, styles, or executable code.

Main Contribution

A five-agent pipeline (Chart2Table, Chart2Vision, Chart2Code, Instruction Decomposition, Multimodal Editing) that de-renders and edits chart images from natural language.

Three linked feedback modes — code checks, visual comparison, numeric checks — used iteratively for self-reflection and error correction.

Empirical gains on ChartCraft: higher SSIM, RMS, and style scores versus ChartLLaMA, ChartReformer, and in-context LLM baselines; ablations show feedback matters.

Key Findings

PlotEdit produces more faithful edited charts than prior methods on ChartCraft.

NumbersOverall SSIM 89.0 vs ChartReformer 82.4 (Table 1)

Multimodal feedback improves edit quality.

NumbersOverall SSIM drops 89.0 → 87.2 without feedback agents (PlotEdit w/o MFA) (Table 1)

Agentic orchestration beats single-shot in-context LLM prompting.

NumbersOverall SSIM: PlotEdit 89.0 vs In-context 86.8 (Table 1)

Results

Overall SSIM

Value89.0

BaselineChartReformer 82.4

Layout SSIM

Value91.3

BaselineChartReformer 82.6

Data-centric SSIM

Value87.5

BaselineChartReformer 81.4

Ablation: overall SSIM without multimodal feedback

Value87.2

BaselinePlotEdit full 89.0

Who Should Care

What To Try In 7 Days

Run PlotEdit on 20 representative PDF charts to measure replot fidelity versus your current process.

Prototype a pipeline: use Chart2Table + Chart2Code to extract tables and generate Python plots, then apply visual feedback checks.

Test accessibility use cases: adjust color/contrast and re-evaluate readability for low-vision settings.

Agent Features

Memory

  • short-term iterative feedback state

Planning

  • instruction decomposition
  • self-reflection loops

Tool Use

  • code execution and dynamic checks
  • AST static analysis
  • image similarity (MS-SSIM)
  • pandas for data edits

Frameworks

  • chain-of-thought prompting
  • few-shot in-context learning

Is Agentic

true

Architectures

  • multi-agent LLM orchestration

Collaboration

  • sequential agent orchestration

Reproducibility

Data Urls

  • ChartCraft dataset (used for evaluation; see Table 1)

Data Available

Open Source Status

  • unknown

Risks & Boundaries

Limitations

  • Relies on large multimodal LLMs (GPT-4V/GPT-4o); access and cost can be barriers.
  • Evaluation is on ChartCraft; real-world charts may be more diverse.
  • Workflow assumes Python plotting and may not support all charting libraries or bespoke visuals.

When Not To Use

  • If you cannot run or pay for multimodal LLMs.
  • When charts are created with non-Python or proprietary rendering pipelines.
  • For legally sensitive charts where sending images to external models is not allowed.

Failure Modes

  • Poor de-rendering when input images are extremely low-resolution or heavily occluded.
  • Hallucinated or incorrect data tables leading to wrong edits.
  • Runtime errors in generated code if environment differs from assumed libraries.

Core Entities

Models

  • GPT-4V
  • GPT-4o
  • ChartReformer
  • ChartLLaMA
  • In-context Learning (LLM prompts)
  • PlotEdit (this work)

Metrics

  • SSIM
  • MS-SSIM
  • RMS (Relative Mapping Similarity)
  • VAES (Visual Attribute Edit Score)

Datasets

  • ChartCraft

Benchmarks

  • ChartCraft evaluation (style/layout/format/data edits)