CEDAR: a three-agent system that produces interleaved plan-and-code notebooks to run data science locally

Overview

Decision SnapshotNeeds Validation

The system is practical and demoed on Kaggle tasks with concrete configs, but evidence is limited to demos and no public code or broad benchmark results yet.

Citations0

Evidence Strength0.50

Confidence0.78

Risk Signals11

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 0/4

Reproducibility

Status: No open assets linked

Open source: Partial

At A Glance

Cost impact: 50%

Production readiness: 60%

Novelty: 60%

Authors

Rishiraj Saha Roy, Chris Hinze, Luzian Hahn, Fabian Kuech

Links

Abstract / PDF

Why It Matters For Business

CEDAR reduces repetitive scripting by automating stepwise DS workflows while keeping data local, speeding prototyping and improving privacy controls for enterprise projects.

Who Should Care

Product Manager ML Engineer Data Scientist CTO Founder

Summary TLDR

CEDAR is a small multi-agent app that structures data‑science work for LLMs. A main orchestrator routes requests to two sub-agents (Text and Code). The system produces numbered plan-and-code steps like a readable notebook, runs code locally in Docker, keeps only compact history (configurable truncation, default 10k chars), and exposes configs (max steps, retries). The authors demonstrate CEDAR on Kaggle-style DS tasks and emphasize privacy (data stays local) and fault recovery via iterative code re-generation.

Problem Statement

Large LLMs can simplify data science, but single-shot prompts fail on multi-step workflows, large or private data, math-heavy tasks, and context-length limits. Practitioners need a structured, transparent way to split planning, code, and outputs while keeping data local and recoverable when code errors occur.

Main Contribution

A practical three-agent architecture: orchestrator, text agent, and code agent to produce interleaved plan and executable code cells.

Structured prompts and JSON-based function calls to prevent hallucinated routing and make tool calls robust.

Key Findings

CEDAR uses three LLM roles: an orchestrator plus separate text and code agents to produce a readable stepwise notebook.

Numbers3 agents (orchestrator, text agent, code agent)

Practical UseSplit planning and code generation in your pipeline: route natural-language explanation to a Text agent and executable code to a Code agent to reduce misinterpreted outputs.

Evidence RefSections 2.3, 3.1

Code runs locally in Docker; only aggregate snapshots and outputs are passed to LLMs, so raw data need not leave the host.

NumbersDocker execution; on-prem LLM support described

Practical UseIf privacy matters, run code cells locally and send only small summaries to remote LLMs or use on‑prem models.

Evidence RefSections 2.5, 3.4

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
Autorun runtime for demo	≈3 minutes for 10–20 steps	—	—	Demo Kaggle task	Section 3.1: autorun ≃ 3 minutes for 10–20 steps	Section 3.1
Default maximum solution steps	30 steps	—	—	Configuration	Section 3.3: default 30	Section 3.3

What To Try In 7 Days

Run CEDAR on a small Kaggle task to compare autorun vs manual scripting.

Set up local Docker execution and confirm that raw data never leaves your host.

Try structured prompts (project summary form) and measure time saved assembling steps versus free-form prompting.

Agent Features

Memory

history rendering (compact summary)configurable truncation (default 10k chars)

Planning

interleaved plan-and-code stepsnumbered stepwise notebook workflow

Tool Use

structured function calls (JSON)code execution tool (Docker container)schema-driven routing

Frameworks

Streamlit frontend

Is Agentic

Yes

Architectures

orchestrator + text agent + code agent

Collaboration

human-in-the-loop review and edits

Optimization Features

Token Efficiency

history truncation and selective output heads/tails

Infra Optimization

on-prem LLM support to avoid large uploadsDocker-based isolated execution

System Optimization

separate agents to reduce retry and parsing errorsJSON schema to avoid hallucinated tool calls

Reproducibility

Code AvailableNo

Data AvailableNo

Open Source StatusPartial

LicenseUnknown

Risks & Boundaries

Limitations

Demo scope limited to beginner-to-intermediate Kaggle tasks, not large real-world pipelines.

Agent complexity is basic; no independent faithfulness or critic agent implemented yet.

When Not To Use

For very large datasets that do not fit in RAM.

When you need provable correctness for critical math-heavy pipelines.

Failure Modes

Hallucinated action names (mitigated by JSON schema but still possible if schema changes).

Code execution errors needing multiple retries for complex environments.

Core Entities

Models

GPT-4oQwen3-Coder 30B

Metrics

runtimenotebook completionerror recovery

Datasets

Kaggle LLM fine-tuning competition (demo)

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

CEDAR uses three LLM roles: an orchestrator plus separate text and code agents to produce a readable stepwise notebook.

Code runs locally in Docker; only aggregate snapshots and outputs are passed to LLMs, so raw data need not leave the host.

Results

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

You May Also Want to Read

A process-aware, auditable multi-agent evaluator that produces more stable, human-aligned scores than a single LLM judge

Key finding

Create, customize, and run multi-step LLM agents from plain language — no code needed

Key finding

MLRC-Bench: a competition-based benchmark that tests if LLM agents can propose and implement novel ML research

Key finding

A closed-loop Sensing→Regulating→Correcting system that routes LLM execution by uncertainty to cut errors and API cost

Key finding

BackdoorAgent: a stage-aware framework and benchmark showing memory backdoors persist across multi-step LLM agents

Key finding