A two-stage agent pipeline that turns raw tables into vetted charts and a publication-ready narrative report.

December 26, 20257 min

Overview

Decision SnapshotNeeds Validation

Design is practical and modular but lacks quantitative user studies or benchmarks in the paper. Claims are architectural and demonstrative rather than experimentally validated.

Citations0

Evidence Strength0.35

Confidence0.70

Risk Signals9

Trust Signals

Findings with numeric evidence: 2/3

Findings with evidence refs: 3/3

Results with explicit delta: 0/3

Reproducibility

Status: Partial assets available

Open source: Unknown

At A Glance

Cost impact: 40%

Production readiness: 50%

Novelty: 50%

Authors

Shuyu Gan, Renxiang Wang, James Mooney, Dongyeop Kang

Links

Abstract / PDF / Data

Why It Matters For Business

Automates the end-to-end path from raw tables to a polished report, saving analyst time on repetitive chart creation, basic QA, and first-draft narrative. The system produces multiple scored insight candidates so teams can select defensible findings instead of relying on a single model output.

Who Should Care

Summary TLDR

A2P-Vis is a two-part multi-agent system: a Data Analyzer that profiles data, generates and executes plotting code, filters poor figures, and scores candidate insights; and a Presenter that orders topics and writes a coherent chart‑grounded report. The pipeline emphasizes automated quality checks (schema profiling, code rectification, chart legibility) and a rubriced insight scorer to produce ready-to-publish narratives from raw tables.

Problem Statement

Current LLM-based data pipelines often (1) fail to produce diverse, evidence-rich visual insights, and (2) do not assemble those charts and findings into a coherent, professional report without manual work.

Main Contribution

Design of Data Analyzer: profile data, propose visualization directions, generate/exec plots, reject low-quality charts, and score candidate insights.

Design of Presenter: rank topics, compose chart-grounded narratives with transitions, summarize takeaways, and revise a polished Markdown report.

Key Findings

The Insight Generator creates multiple alternatives and delivers a small set of vetted insights.

NumbersProduces 57 candidate insights per chart; returns top 3 per chart after scoring.

Practical UseGenerate multiple explanation drafts and automatically score them; pick top-ranked ones to avoid committing to a single possibly bad interpretation.

Evidence RefSection 2.1 (Insight Generator & Evaluator)

Lightweight dataset profiling (Sniffer) enforces a schema contract to avoid common failures.

Practical UseUse a compact metadata profile instead of streaming full records to the model. That reduces hallucinated column use and prevents empty or degenerate plots.

Evidence RefSection 2.1 (Sniffer)

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Candidate insights generated per chart57 candidatesper-chart (described in pipeline)Section 2.1 Insight GeneratorSection 2.1
Final insights returned per chart after scoringTop 3 insightsper-chartSection 2.1 Insight EvaluatorSection 2.1

What To Try In 7 Days

Run the Sniffer on a representative table to extract a schema profile and check for mis-typed columns.

Use the Visualizer flow on one dashboard: generate directions, auto-generate code, execute, and inspect the rectified plots.

Generate 5–7 candidate insights per chart and apply a simple rubric to pick the top 3. Compare human picks to the scorer.

Agent Features

Memory
Short-term metadata profile (schema contract)Per-run topic and chart outputs
Planning
Task DecompositionTopic SequencingVisualization Direction Planning
Tool Use
Code generation for plotsAutomated code executionError-rectification callbacksChart legibility checking
Frameworks
chain-of-thought style revision
Is Agentic

Yes

Collaboration
Multi-agent pipeline (Analyzer ↔ Presenter)Module handoffs via structured metadata

Reproducibility

Code AvailableNo
Data AvailableYes
Open Source StatusUnknown
LicenseUnknown

Risks & Boundaries

Limitations

No quantitative evaluation or user study reported in the paper.

No public code repository provided; reproducibility is limited.

When Not To Use

When you need statistically rigorous inference or peer-reviewed analysis (not just automated reporting).

When data is highly sensitive and cannot be profiled or passed to external services.

Failure Modes

LLM-generated code may still produce incorrect plots despite rectification.

Insight scorer can surface plausible but incorrect explanations (model hallucination).

Context Entities

Models

Google Data Science Agent (referenced)

Metrics

insight scoring rubric (Correctness, Specificity, Depth, So-what quality)