Overview
Production Readiness
0.6
Novelty Score
0.7
Cost Impact Score
0.6
Citation Count
0
Why It Matters For Business
CMA makes assistants keep facts up to date, recall what happened around events, and answer multi-hop queries—improving trust and utility for long-running workflows, at the cost of higher latency and added governance needs.
Summary TLDR
This paper defines Continuum Memory Architectures (CMA): a class of memory systems that keep state across sessions, let retrieval change memory, link items associatively, chain events by time, and consolidate repeated experience into abstractions. A reference lifecycle and a working instantiation are described and compared to a RAG baseline across four behavioral probes. CMA strongly outperforms RAG on update, association, and disambiguation tasks but costs ~2.4× latency and raises drift, interpretability, and governance concerns.
Problem Statement
Current RAG setups treat memory as static read-only storage. That prevents agents from reliably updating facts, forming temporal chains, making multi-hop associations, or consolidating experience. The paper argues these behaviors are necessary for long-lived agents and proposes CMA as an architectural class that enforces them.
Main Contribution
Define CMA as a behavioral checklist: persistence, selective retention, retrieval-driven mutation, associative routing, temporal chaining, and consolidation.
Provide a reference lifecycle (ingest, activation, retrieval, mutation, consolidation) that can guide implementations and audits.
Implement a CMA instantiation and run four behavioral probes vs a strong RAG baseline, showing consistent advantages on memory-dynamic tasks.
Document failure modes, scaling trade-offs, and practical mitigations for latency, drift, interpretability, and governance.
Key Findings
Selective retention: CMA surfaces corrected facts instead of stale ones.
Temporal chaining: CMA retrieves events near a time anchor better than RAG.
Associative routing: CMA supports multi-hop recall through linked memory graphs.
Overall behavioral wins come with runtime cost.
Results
Knowledge updates (wins)
Temporal association (wins)
Associative recall (wins)
Contextual disambiguation (wins)
Overall decisive wins
Latency (mean)
Temporal probe failure rate
Who Should Care
What To Try In 7 Days
Prototype a lightweight CMA layer: add timestamps, salience, and reinforcement counters to a vector store
Run a small 'knowledge update' probe: record a fact, issue a correction, and compare retrieval
Log provenance and reinforcement deltas for a week to detect drift early and tune suppression rules
Agent Features
Memory
- persistence across sessions
- selective retention (decay, salience)
- retrieval-driven mutation
- associative routing
- temporal chaining
- consolidation/abstraction
Planning
- consolidation (background abstraction)
- retrieval-driven updates affecting future planning
Tool Use
- vector DB (pgvector) + graph memory
- LLM summarizers for consolidation
Frameworks
- Supabase pgvector
- text-embedding-3-small embeddings
Is Agentic
true
Architectures
- graph-structured memory
- activation-field (spreading activation)
- multi-resolution clusters
Collaboration
- provenance and audit logs for human oversight
Optimization Features
Token Efficiency
- summarize large fragments before storage to limit node growth
Infra Optimization
- hierarchical storage and cached activation maps
- possible hardware acceleration for graph traversal
System Optimization
- background consolidation jobs to amortize work
- instrumentation for activation and reinforcement traces
Inference Optimization
- multi-resolution graphs to reduce traversal
- cap activation fan-out to bound runtime
- cache activation maps for hot clusters
Reproducibility
Open Source Status
- partial
Risks & Boundaries
Limitations
- Higher latency and compute from activation propagation and consolidation
- Memory drift from retrieval-driven reinforcement can reinforce errors
- Temporal segmentation and episode-boundary detection remain brittle
- Evolving graphs are harder to audit and require provenance tooling
- Persistent memories raise privacy and compliance obligations
When Not To Use
- When low-latency responses are critical and extra 2.4× runtime is unacceptable
- For short-lived sessions where long-horizon memory is unnecessary
- If strict data deletion or zero-retention policies are mandated without engineering for governance
Failure Modes
- Reinforcement loops that amplify incorrect memories (drift)
- Scaling blowups as graph edges and activation fan-out grow
- Consolidation that produces misleading abstractions or loses factual detail
- Privacy leaks if persistent fragments are not properly gated
Core Entities
Models
- GPT-4o
- text-embedding-3-small
Metrics
- win counts
- Cohen's d
- Cohen's h
- latency (s)
- per-query rubric scores (0-1)
Datasets
- custom internal corpora (withheld)
Context Entities
Models
- GPT-4o (LLM judge)
Metrics
- per-study permutation tests (p < 0.01)
- McNemar's test (p < 0.01)
Datasets
- behavioral probe corpora (authors; redacted)

