Continuum Memory: make agent memory persistent, mutable, and associative

January 14, 20267 min

Overview

Production Readiness

0.6

Novelty Score

0.7

Cost Impact Score

0.6

Citation Count

0

Authors

Joe Logan

Links

Abstract / PDF

Why It Matters For Business

CMA makes assistants keep facts up to date, recall what happened around events, and answer multi-hop queries—improving trust and utility for long-running workflows, at the cost of higher latency and added governance needs.

Summary TLDR

This paper defines Continuum Memory Architectures (CMA): a class of memory systems that keep state across sessions, let retrieval change memory, link items associatively, chain events by time, and consolidate repeated experience into abstractions. A reference lifecycle and a working instantiation are described and compared to a RAG baseline across four behavioral probes. CMA strongly outperforms RAG on update, association, and disambiguation tasks but costs ~2.4× latency and raises drift, interpretability, and governance concerns.

Problem Statement

Current RAG setups treat memory as static read-only storage. That prevents agents from reliably updating facts, forming temporal chains, making multi-hop associations, or consolidating experience. The paper argues these behaviors are necessary for long-lived agents and proposes CMA as an architectural class that enforces them.

Main Contribution

Define CMA as a behavioral checklist: persistence, selective retention, retrieval-driven mutation, associative routing, temporal chaining, and consolidation.

Provide a reference lifecycle (ingest, activation, retrieval, mutation, consolidation) that can guide implementations and audits.

Implement a CMA instantiation and run four behavioral probes vs a strong RAG baseline, showing consistent advantages on memory-dynamic tasks.

Document failure modes, scaling trade-offs, and practical mitigations for latency, drift, interpretability, and governance.

Key Findings

Selective retention: CMA surfaces corrected facts instead of stale ones.

NumbersCMA won 38/40 queries; Cohen's d = 1.84

Temporal chaining: CMA retrieves events near a time anchor better than RAG.

NumbersCMA retrieved temporally adjacent events in 13/14 decisive trials; Cohen's h = 2.06

Associative routing: CMA supports multi-hop recall through linked memory graphs.

NumbersCMA won 14/19 decisive associative-recall trials; Cohen's h = 0.99

Overall behavioral wins come with runtime cost.

NumbersCMA won 82/92 decisive trials; mean latency 1.48s vs 0.65s (≈2.4×)

Results

Knowledge updates (wins)

ValueCMA 38 / 40

BaselineRAG 1 / 40

Temporal association (wins)

ValueCMA 13 decisive wins

BaselineRAG 1 decisive win

Associative recall (wins)

ValueCMA 14 decisive wins

BaselineRAG 5 decisive wins

Contextual disambiguation (wins)

ValueCMA 17 / 20 decisive wins

BaselineRAG 3 / 20

Overall decisive wins

ValueCMA 82 / 92

BaselineRAG 10 / 92

Latency (mean)

Value1.48s

Baseline0.65s

Temporal probe failure rate

Value47% both wrong

Who Should Care

What To Try In 7 Days

Prototype a lightweight CMA layer: add timestamps, salience, and reinforcement counters to a vector store

Run a small 'knowledge update' probe: record a fact, issue a correction, and compare retrieval

Log provenance and reinforcement deltas for a week to detect drift early and tune suppression rules

Agent Features

Memory

  • persistence across sessions
  • selective retention (decay, salience)
  • retrieval-driven mutation
  • associative routing
  • temporal chaining
  • consolidation/abstraction

Planning

  • consolidation (background abstraction)
  • retrieval-driven updates affecting future planning

Tool Use

  • vector DB (pgvector) + graph memory
  • LLM summarizers for consolidation

Frameworks

  • Supabase pgvector
  • text-embedding-3-small embeddings

Is Agentic

true

Architectures

  • graph-structured memory
  • activation-field (spreading activation)
  • multi-resolution clusters

Collaboration

  • provenance and audit logs for human oversight

Optimization Features

Token Efficiency

  • summarize large fragments before storage to limit node growth

Infra Optimization

  • hierarchical storage and cached activation maps
  • possible hardware acceleration for graph traversal

System Optimization

  • background consolidation jobs to amortize work
  • instrumentation for activation and reinforcement traces

Inference Optimization

  • multi-resolution graphs to reduce traversal
  • cap activation fan-out to bound runtime
  • cache activation maps for hot clusters

Reproducibility

Open Source Status

  • partial

Risks & Boundaries

Limitations

  • Higher latency and compute from activation propagation and consolidation
  • Memory drift from retrieval-driven reinforcement can reinforce errors
  • Temporal segmentation and episode-boundary detection remain brittle
  • Evolving graphs are harder to audit and require provenance tooling
  • Persistent memories raise privacy and compliance obligations

When Not To Use

  • When low-latency responses are critical and extra 2.4× runtime is unacceptable
  • For short-lived sessions where long-horizon memory is unnecessary
  • If strict data deletion or zero-retention policies are mandated without engineering for governance

Failure Modes

  • Reinforcement loops that amplify incorrect memories (drift)
  • Scaling blowups as graph edges and activation fan-out grow
  • Consolidation that produces misleading abstractions or loses factual detail
  • Privacy leaks if persistent fragments are not properly gated

Core Entities

Models

  • GPT-4o
  • text-embedding-3-small

Metrics

  • win counts
  • Cohen's d
  • Cohen's h
  • latency (s)
  • per-query rubric scores (0-1)

Datasets

  • custom internal corpora (withheld)

Context Entities

Models

  • GPT-4o (LLM judge)

Metrics

  • per-study permutation tests (p < 0.01)
  • McNemar's test (p < 0.01)

Datasets

  • behavioral probe corpora (authors; redacted)