Overview
Production Readiness
0.6
Novelty Score
0.6
Cost Impact Score
0.45
Citation Count
4
Why It Matters For Business
Automate chart reading and QA by fine-tuning multimodal LLMs with domain-specific chart instructions; expect better classification and reasoning but not perfect numeric table extraction.
Summary TLDR
This paper releases MMC-Instruction, a 600k-instance instruction-tuning dataset for chart understanding, plus a 2k-item human-annotated MMC-Benchmark covering nine chart tasks. The authors fine-tune an LMM (MMCA) via a two-stage training recipe (chart-text alignment then LoRA-based instruction tuning) and show MMCA improves open-source LMM performance on chart QA and related tasks. Large gaps remain: GPT-4V still struggles on precise chart-to-table/json extraction and many models fail at OCR, layout reasoning, and following instructions.
Problem Statement
Current large multimodal models miss chart-specific skills (text layout, numeric extraction, chart reasoning). The paper aims to supply large, diverse training data and an evaluation benchmark to teach and measure chart understanding in LMMs.
Main Contribution
MMC-Instruction: a 600k-instance chart instruction-tuning corpus combining 210k chart-caption pairs, ~190k filtered public pairs, and 200k GPT-4-generated instruction examples.
MMC-Benchmark: a human-annotated benchmark (~2k questions/images) covering nine chart-focused tasks and two evaluation protocols (GPT-4 generation scoring and MQA multiple-choice).
MMCA: an instruction-tuned multimodal assistant (based on mPLUG-Owl + LoRA) trained with a two-stage regimen that outperforms prior open-source LMMs on chart tasks.
Key Findings
Large instruction corpus improves open-source LMMs on chart tasks.
MMCA raises multiple-choice (MQA) accuracy over baselines.
State-of-the-art GPT-4V still fails on precise numeric extraction tasks.
MMCA outperforms prior methods on public chart/document benchmarks.
Vision encoder fine-tuning helps chart performance.
Common error modes: perception, language bias, and instruction-following failures.
Results
MMC-Benchmark overall (free-form, GPT-4 judged)
MMC-Benchmark overall (MQA multiple-choice)
Chart to Datatable (free-form)
Chart to Json (free-form)
ChartQA (public benchmark)
Vision encoder fine-tuning ablation
Who Should Care
What To Try In 7 Days
Run MMCA (or fine-tune an LMM with MMC-Instruction) on a small set of your company charts to measure gains on classification and reasoning.
Add a verification OCR stage for numeric extraction before trusting model outputs for BI dashboards.
Use the MMC-Benchmark tasks and MQA protocol to baseline current tools on your chart types.
Optimization Features
Training Optimization
- LoRA
Reproducibility
Code Urls
Data Urls
Code Available
Data Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Instruction data partly generated by GPT-4 and can contain errors or hallucinations (authors report ~85% outputs acceptable).
- Chart-to-datatable and chart-to-json extraction remain low-accuracy tasks even for top models.
- Experiments use a 7B-model backbone; results may change with larger models or different compute.
When Not To Use
- When you need exact, lossless extraction of all numeric values from charts.
- When legal or privacy rules forbid sharing chart images with third-party models.
- If you need a turnkey solution for production-grade OCR without verification.
Failure Modes
- Vision perception error—misreading plot elements or values.
- Language bias—model answers from prior knowledge not chart evidence.
- Not following instructions—open-source LMMs sometimes ignore prompts.
- OCR/missing-value failure—single missing numeric makes table extraction incorrect.
Core Entities
Models
- MMCA
- mPLUG-Owl
- GPT-4V
- LLaVA1.5
- MiniGPT-v2
- LRV-Instruction
- Pix2Struct
- Donut
- BLIP-2
- InstructBLIP
- Shikra
- Vicuna
Metrics
- free-form correctness (GPT-4 scoring)
- Accuracy
Datasets
- MMC-Instruction
- MMC-Benchmark
- ChartQA
- PlotQA
- DVQA
- FigureQA
- SciGraphQA
- Statista
- VisText
- ChartInfo
- Unichart
- DocVQA
- TextVQA
Benchmarks
- MMC-Benchmark
- ChartQA
- DocVQA
- TextVQA
Context Entities
Models
- GPT-4 (text-only for data generation)
- gpt-4-32k-0314 (GPT-4 used for eval prompts)
Datasets
- arXiv Scientific Chart-Caption corpus
- Public chart datasets used for augmentation (Statista, PlotQA, VisText, ChartInfo, Unichart)

