CEDAR: a three-agent system that produces interleaved plan-and-code notebooks to run data science locally

January 10, 20267 min

Overview

Decision SnapshotNeeds Validation

The system is practical and demoed on Kaggle tasks with concrete configs, but evidence is limited to demos and no public code or broad benchmark results yet.

Citations0

Evidence Strength0.50

Confidence0.78

Risk Signals11

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 0/4

Reproducibility

Status: No open assets linked

Open source: Partial

At A Glance

Cost impact: 50%

Production readiness: 60%

Novelty: 60%

Authors

Rishiraj Saha Roy, Chris Hinze, Luzian Hahn, Fabian Kuech

Links

Abstract / PDF

Why It Matters For Business

CEDAR reduces repetitive scripting by automating stepwise DS workflows while keeping data local, speeding prototyping and improving privacy controls for enterprise projects.

Who Should Care

Summary TLDR

CEDAR is a small multi-agent app that structures data‑science work for LLMs. A main orchestrator routes requests to two sub-agents (Text and Code). The system produces numbered plan-and-code steps like a readable notebook, runs code locally in Docker, keeps only compact history (configurable truncation, default 10k chars), and exposes configs (max steps, retries). The authors demonstrate CEDAR on Kaggle-style DS tasks and emphasize privacy (data stays local) and fault recovery via iterative code re-generation.

Problem Statement

Large LLMs can simplify data science, but single-shot prompts fail on multi-step workflows, large or private data, math-heavy tasks, and context-length limits. Practitioners need a structured, transparent way to split planning, code, and outputs while keeping data local and recoverable when code errors occur.

Main Contribution

A practical three-agent architecture: orchestrator, text agent, and code agent to produce interleaved plan and executable code cells.

Structured prompts and JSON-based function calls to prevent hallucinated routing and make tool calls robust.

Key Findings

CEDAR uses three LLM roles: an orchestrator plus separate text and code agents to produce a readable stepwise notebook.

Numbers3 agents (orchestrator, text agent, code agent)

Practical UseSplit planning and code generation in your pipeline: route natural-language explanation to a Text agent and executable code to a Code agent to reduce misinterpreted outputs.

Evidence RefSections 2.3, 3.1

Code runs locally in Docker; only aggregate snapshots and outputs are passed to LLMs, so raw data need not leave the host.

NumbersDocker execution; on-prem LLM support described

Practical UseIf privacy matters, run code cells locally and send only small summaries to remote LLMs or use on‑prem models.

Evidence RefSections 2.5, 3.4

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Autorun runtime for demo≈3 minutes for 1020 stepsDemo Kaggle taskSection 3.1: autorun ≃ 3 minutes for 10–20 stepsSection 3.1
Default maximum solution steps30 stepsConfigurationSection 3.3: default 30Section 3.3

What To Try In 7 Days

Run CEDAR on a small Kaggle task to compare autorun vs manual scripting.

Set up local Docker execution and confirm that raw data never leaves your host.

Try structured prompts (project summary form) and measure time saved assembling steps versus free-form prompting.

Agent Features

Memory
history rendering (compact summary)configurable truncation (default 10k chars)
Planning
interleaved plan-and-code stepsnumbered stepwise notebook workflow
Tool Use
structured function calls (JSON)code execution tool (Docker container)schema-driven routing
Frameworks
Streamlit frontend
Is Agentic

Yes

Architectures
orchestrator + text agent + code agent
Collaboration
human-in-the-loop review and edits

Optimization Features

Token Efficiency
history truncation and selective output heads/tails
Infra Optimization
on-prem LLM support to avoid large uploadsDocker-based isolated execution
System Optimization
separate agents to reduce retry and parsing errorsJSON schema to avoid hallucinated tool calls

Reproducibility

Code AvailableNo
Data AvailableNo
Open Source StatusPartial
LicenseUnknown

Risks & Boundaries

Limitations

Demo scope limited to beginner-to-intermediate Kaggle tasks, not large real-world pipelines.

Agent complexity is basic; no independent faithfulness or critic agent implemented yet.

When Not To Use

For very large datasets that do not fit in RAM.

When you need provable correctness for critical math-heavy pipelines.

Failure Modes

Hallucinated action names (mitigated by JSON schema but still possible if schema changes).

Code execution errors needing multiple retries for complex environments.

Core Entities

Models

GPT-4oQwen3-Coder 30B

Metrics

runtimenotebook completionerror recovery

Datasets

Kaggle LLM fine-tuning competition (demo)