Overview
Production Readiness
0.6
Novelty Score
0.6
Cost Impact Score
0.5
Citation Count
0
Why It Matters For Business
CEDAR reduces repetitive scripting by automating stepwise DS workflows while keeping data local, speeding prototyping and improving privacy controls for enterprise projects.
Summary TLDR
CEDAR is a small multi-agent app that structures data‑science work for LLMs. A main orchestrator routes requests to two sub-agents (Text and Code). The system produces numbered plan-and-code steps like a readable notebook, runs code locally in Docker, keeps only compact history (configurable truncation, default 10k chars), and exposes configs (max steps, retries). The authors demonstrate CEDAR on Kaggle-style DS tasks and emphasize privacy (data stays local) and fault recovery via iterative code re-generation.
Problem Statement
Large LLMs can simplify data science, but single-shot prompts fail on multi-step workflows, large or private data, math-heavy tasks, and context-length limits. Practitioners need a structured, transparent way to split planning, code, and outputs while keeping data local and recoverable when code errors occur.
Main Contribution
A practical three-agent architecture: orchestrator, text agent, and code agent to produce interleaved plan and executable code cells.
Structured prompts and JSON-based function calls to prevent hallucinated routing and make tool calls robust.
History rendering that compacts past instructions, successful code heads, and error tails to stay within context limits (default 10k chars).
Local code execution inside Docker with optional on-prem LLM support to keep data local and improve privacy.
An app UI (Streamlit) that exports solutions as JSON, Markdown, or Jupyter notebooks and supports autorun and stepwise inspection.
Key Findings
CEDAR uses three LLM roles: an orchestrator plus separate text and code agents to produce a readable stepwise notebook.
Code runs locally in Docker; only aggregate snapshots and outputs are passed to LLMs, so raw data need not leave the host.
History rendering keeps at most the most recent 10k characters by default and includes heads of successful outputs and tails of error traces.
Autorun finishes a multi-step run quickly in typical cases: about 3 minutes for 10–20 steps in the demo.
Default operational knobs: maximum steps = 30 and maximum code retries = 3.
Results
Autorun runtime for demo
Default maximum solution steps
Default code retries
History truncation window
Who Should Care
What To Try In 7 Days
Run CEDAR on a small Kaggle task to compare autorun vs manual scripting.
Set up local Docker execution and confirm that raw data never leaves your host.
Try structured prompts (project summary form) and measure time saved assembling steps versus free-form prompting.
Agent Features
Memory
- history rendering (compact summary)
- configurable truncation (default 10k chars)
Planning
- interleaved plan-and-code steps
- numbered stepwise notebook workflow
Tool Use
- structured function calls (JSON)
- code execution tool (Docker container)
- schema-driven routing
Frameworks
- Streamlit frontend
Is Agentic
true
Architectures
- orchestrator + text agent + code agent
Collaboration
- human-in-the-loop review and edits
Optimization Features
Token Efficiency
- history truncation and selective output heads/tails
Infra Optimization
- on-prem LLM support to avoid large uploads
- Docker-based isolated execution
System Optimization
- separate agents to reduce retry and parsing errors
- JSON schema to avoid hallucinated tool calls
Reproducibility
Open Source Status
- partial
Risks & Boundaries
Limitations
- Demo scope limited to beginner-to-intermediate Kaggle tasks, not large real-world pipelines.
- Agent complexity is basic; no independent faithfulness or critic agent implemented yet.
- Requires data to fit in host memory and access to local compute for on‑prem LLMs.
- No public code release at time of writing; reproducibility is partial.
When Not To Use
- For very large datasets that do not fit in RAM.
- When you need provable correctness for critical math-heavy pipelines.
- If you rely exclusively on cloud LLMs and cannot run local execution or bind sensitive data.
Failure Modes
- Hallucinated action names (mitigated by JSON schema but still possible if schema changes).
- Code execution errors needing multiple retries for complex environments.
- Context truncation may drop earlier rationale needed for late-stage decisions.
- Dependency, auth, or network issues when accessing on-prem LLMs or tools.
Core Entities
Models
- GPT-4o
- Qwen3-Coder 30B
Metrics
- runtime
- notebook completion
- error recovery
Datasets
- Kaggle LLM fine-tuning competition (demo)

