CEDAR: a three-agent system that produces interleaved plan-and-code notebooks to run data science locally

January 10, 20267 min

Overview

Production Readiness

0.6

Novelty Score

0.6

Cost Impact Score

0.5

Citation Count

0

Authors

Rishiraj Saha Roy, Chris Hinze, Luzian Hahn, Fabian Kuech

Links

Abstract / PDF

Why It Matters For Business

CEDAR reduces repetitive scripting by automating stepwise DS workflows while keeping data local, speeding prototyping and improving privacy controls for enterprise projects.

Summary TLDR

CEDAR is a small multi-agent app that structures data‑science work for LLMs. A main orchestrator routes requests to two sub-agents (Text and Code). The system produces numbered plan-and-code steps like a readable notebook, runs code locally in Docker, keeps only compact history (configurable truncation, default 10k chars), and exposes configs (max steps, retries). The authors demonstrate CEDAR on Kaggle-style DS tasks and emphasize privacy (data stays local) and fault recovery via iterative code re-generation.

Problem Statement

Large LLMs can simplify data science, but single-shot prompts fail on multi-step workflows, large or private data, math-heavy tasks, and context-length limits. Practitioners need a structured, transparent way to split planning, code, and outputs while keeping data local and recoverable when code errors occur.

Main Contribution

A practical three-agent architecture: orchestrator, text agent, and code agent to produce interleaved plan and executable code cells.

Structured prompts and JSON-based function calls to prevent hallucinated routing and make tool calls robust.

History rendering that compacts past instructions, successful code heads, and error tails to stay within context limits (default 10k chars).

Local code execution inside Docker with optional on-prem LLM support to keep data local and improve privacy.

An app UI (Streamlit) that exports solutions as JSON, Markdown, or Jupyter notebooks and supports autorun and stepwise inspection.

Key Findings

CEDAR uses three LLM roles: an orchestrator plus separate text and code agents to produce a readable stepwise notebook.

Numbers3 agents (orchestrator, text agent, code agent)

Code runs locally in Docker; only aggregate snapshots and outputs are passed to LLMs, so raw data need not leave the host.

NumbersDocker execution; on-prem LLM support described

History rendering keeps at most the most recent 10k characters by default and includes heads of successful outputs and tails of error traces.

Numbers10k character truncation (configurable)

Autorun finishes a multi-step run quickly in typical cases: about 3 minutes for 10–20 steps in the demo.

Numbers≈3 minutes for 10–20 steps (demo)

Default operational knobs: maximum steps = 30 and maximum code retries = 3.

Numbersmax steps 30; retries 3 (defaults)

Results

Autorun runtime for demo

Value≈3 minutes for 10–20 steps

Default maximum solution steps

Value30 steps

Default code retries

Value3 retries

History truncation window

Value10k characters (configurable)

Who Should Care

What To Try In 7 Days

Run CEDAR on a small Kaggle task to compare autorun vs manual scripting.

Set up local Docker execution and confirm that raw data never leaves your host.

Try structured prompts (project summary form) and measure time saved assembling steps versus free-form prompting.

Agent Features

Memory

  • history rendering (compact summary)
  • configurable truncation (default 10k chars)

Planning

  • interleaved plan-and-code steps
  • numbered stepwise notebook workflow

Tool Use

  • structured function calls (JSON)
  • code execution tool (Docker container)
  • schema-driven routing

Frameworks

  • Streamlit frontend

Is Agentic

true

Architectures

  • orchestrator + text agent + code agent

Collaboration

  • human-in-the-loop review and edits

Optimization Features

Token Efficiency

  • history truncation and selective output heads/tails

Infra Optimization

  • on-prem LLM support to avoid large uploads
  • Docker-based isolated execution

System Optimization

  • separate agents to reduce retry and parsing errors
  • JSON schema to avoid hallucinated tool calls

Reproducibility

Open Source Status

  • partial

Risks & Boundaries

Limitations

  • Demo scope limited to beginner-to-intermediate Kaggle tasks, not large real-world pipelines.
  • Agent complexity is basic; no independent faithfulness or critic agent implemented yet.
  • Requires data to fit in host memory and access to local compute for on‑prem LLMs.
  • No public code release at time of writing; reproducibility is partial.

When Not To Use

  • For very large datasets that do not fit in RAM.
  • When you need provable correctness for critical math-heavy pipelines.
  • If you rely exclusively on cloud LLMs and cannot run local execution or bind sensitive data.

Failure Modes

  • Hallucinated action names (mitigated by JSON schema but still possible if schema changes).
  • Code execution errors needing multiple retries for complex environments.
  • Context truncation may drop earlier rationale needed for late-stage decisions.
  • Dependency, auth, or network issues when accessing on-prem LLMs or tools.

Core Entities

Models

  • GPT-4o
  • Qwen3-Coder 30B

Metrics

  • runtime
  • notebook completion
  • error recovery

Datasets

  • Kaggle LLM fine-tuning competition (demo)