Teach coding agents from past runs: extract and reuse 'shortcuts' to speed multi-agent software development

December 28, 20236 min

Overview

Decision SnapshotReady For Pilot

The method gives a clear practical recipe (graph+shortcut extraction+retrieval) and shows strong empirical gains on SRDD, but real-world production readiness is limited by evaluation scope and reliance on compile checks.

Citations0

Evidence Strength0.80

Confidence0.85

Risk Signals10

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 5/5

Reproducibility

Status: Code + data available

Open source: Yes

At A Glance

Cost impact: 60%

Production readiness: 50%

Novelty: 60%

Authors

Chen Qian, Yufan Dang, Jiahao Li, Wei Liu, Zihao Xie, Yifei Wang, Weize Chen, Cheng Yang, Xin Cong, Xiaoyin Che, Zhiyuan Liu, Maosong Sun

Links

Abstract / PDF / Code / Data

Why It Matters For Business

Reusing vetted past fixes reduces developer iteration time and increases the chance generated prototypes are runnable, cutting manual triage and speeding prototyping.

Who Should Care

Summary TLDR

The paper introduces Experiential Co-Learning: a two-role (instructor, assistant) multi-agent framework that records multi-step agent interactions as task graphs, extracts high-value non-adjacent transitions called "shortcuts" using compile and similarity signals, and retrieves those experiences as few-shot examples during future reasoning. On the SRDD software-requirement dataset, this approach raises a holistic quality metric from 0.4267 to 0.7304 and shortens development time versus strong multi-agent baselines. Code and data are available at the project's GitHub.

Problem Statement

Multi-agent coding systems treat each new task independently, causing repeated mistakes and wasted iterations because past cross-task experience is not captured or reused. The paper tackles how to design, collect and apply reusable experiences to make agent collaboration faster and more reliable.

Main Contribution

Proposes Experiential Co-Learning: co-tracking, co-memorizing, co-reasoning to collect and reuse agent experiences.

Introduces task-execution graphs and extracts heuristic non-adjacent 'shortcuts' (compile + similarity filtered) as key experiences.

Key Findings

Experience reuse almost doubles the holistic software quality metric versus a strong multi-agent baseline.

NumbersQuality 0.4267 -> 0.7304 (test set)

Practical UseAdd a small experience store and retrieval step to multi-agent pipelines to raise end-to-end software quality and reduce manual fixes.

Evidence RefTable 1

Completeness and executability improve substantially when agents reuse shortcuts.

NumbersCompleteness 0.6131 -> 0.9497; Executability 0.88 -> 0.965

Practical UseReusing past validated code fragments increases the chance generated projects are complete and compile immediately.

Evidence RefTable 1

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Quality0.7304ChatDev 0.4267+0.3037SRDD test setCo-Learning quality 0.7304 vs ChatDev 0.4267Table 1
Completeness0.9497ChatDev 0.6131+0.3366SRDD test setHigher percentage of code without TODOsTable 1

What To Try In 7 Days

Log agent instruction/solution pairs during multi-turn runs.

Build a simple deduplicated task graph using a hash of code snapshots.

Keep shortcuts that compile and match requirements; store as key-value experiences (instruction->solution and solution->instruction).

Agent Features

Memory
experience pools (key-value shortcut memories)
Planning
multi-turn planning via iterative instruction-solution cycles
Tool Use
external compiler and code checkerembedding-based retrieval
Frameworks
co-trackingco-memorizingco-reasoning
Is Agentic

Yes

Architectures
two-role instructor-assistant multi-agent
Collaboration
role-based multi-turn communicationfew-shot example exchange

Optimization Features

Token Efficiency
use of single best code example (k_code=1) reduces context size
Training Optimization
heuristic shortcut selection to focus useful examples
Inference Optimization
retrieve top-k experiences to build in-context examples

Reproducibility

Code AvailableYes
Data AvailableYes
Open Source StatusYes
LicenseUnknown

Risks & Boundaries

Limitations

Agents tend to implement simple logic; suitable for prototypes not full production systems.

Evaluation uses SRDD and compile-based checks; lacks broad real-world validation.

When Not To Use

For safety-critical or production systems without human review.

When requirements are vague or require complex domain reasoning.

Failure Modes

Solution backtracking and correct-to-failure degeneration if shortcuts are noisy.

Over-reliance on past experiences can repeat past mistakes on novel tasks.

Core Entities

Models

GPT-3.5-Turbotext-embedding-ada-002GPT-4 (evaluator)

Metrics

CompletenessExecutabilityConsistencyQuality (product of three metrics)Duration (s)

Datasets

SRDD (1,200 software requirements)

Context Entities

Models

MD5 (hashing for deduplication)