Overview
Production Readiness
0.45
Novelty Score
0.6
Cost Impact Score
0.45
Citation Count
2
Why It Matters For Business
Iteratively refining and pruning agent experiences cuts noisy guidance, raises code quality by ~10% on the tested benchmark, and reduces the stored experience set to ~11.5%, saving storage and retrieval costs.
Summary TLDR
This paper introduces Iterative Experience Refinement (IER), a framework that lets multi-agent, LLM-based software developers iteratively collect, reuse, and prune "shortcut" experiences (solution→instruction and instruction→solution pairs). Two propagation patterns are studied: successive (inherit last batch) and cumulative (inherit all history). A heuristic elimination step keeps high-information and frequently used experiences, shrinking the pool to 11.54% while improving or maintaining software quality on the SRDD benchmark using ChatGPT-3.5.
Problem Statement
Current experience-enabled LLM agents use a fixed, heuristically collected set of past experiences. That static pool cannot be refined over time, which limits adaptability and lets low-quality or rarely used experiences accumulate and dilute useful guidance.
Main Contribution
Propose Iterative Experience Refinement (IER) to acquire, propagate, and refine agent experiences across task batches.
Define two propagation patterns: successive (inherit from previous batch) and cumulative (inherit from all past batches).
Introduce a heuristic elimination combining information gain and retrieval frequency to keep high-quality experiences and reduce pool size.
Key Findings
IER improves end-to-end software quality compared to prior experience-based methods on SRDD.
Heuristic elimination concentrates useful experiences and drastically reduces pool size.
Successive pattern reaches higher peaks but is less stable; cumulative pattern is more stable over batches.
Results
Quality (completeness×executability×consistency)
Executability
Completeness
Duration (avg seconds or rounds)
Who Should Care
What To Try In 7 Days
Run a small task-batch pipeline and log solution→instruction shortcuts during runs.
Implement vector-based retrieval (embeddings + cosine similarity) to reuse shortcuts as few-shot examples.
Experiment with two patterns: successive (only last batch) and cumulative (all history) and compare quality vs stability over 3–6 batches.
Agent Features
Memory
- experience pool of shortcuts (solution→instruction and instruction→solution)
- iterative update (successive or cumulative)
Planning
- iterative refinement across batches
Tool Use
- vector-based retrieval
- external compiler for validation
Frameworks
- ChatDev
- ECL
- ExpeL
Is Agentic
true
Architectures
- multi-agent (instructive + responsive)
- batch-wise experience propagation
Collaboration
- role-based agent communication (instructor and responder)
Optimization Features
System Optimization
- experience elimination reduces pool size and retrieval load
Inference Optimization
- vector retrieval to reduce search latency
Reproducibility
Data Urls
- SRDD (referenced from Qian et al. 2023a)
Data Available
Open Source Status
- unknown
Risks & Boundaries
Limitations
- Evaluation uses only ChatGPT-3.5; results may differ with other LLMs.
- Benchmark set is SRDD only; domain diversity is limited.
- Elimination thresholds (ϵ, θ) are heuristic and fixed in experiments.
- Successive pattern can amplify poor refinements and become unstable.
When Not To Use
- Tasks needing novel, non-repeated solutions where past shortcuts can mislead.
- Safety-critical or auditable code where automated reuse of prior shortcuts is risky.
- Environments without compute/storage for embeddings and a vector DB.
Failure Modes
- Experience pool growth dilutes high-quality experiences (cumulative pattern).
- Poor refinements in one batch can propagate and degrade future results (successive pattern).
- Reliance on embedding similarity may retrieve semantically wrong shortcuts.
Core Entities
Models
- ChatGPT-3.5
- text-embedding-ada-002
Metrics
- Completeness
- Executability
- Consistency
- Quality
- Duration
Datasets
- SRDD

