D-MEM: dopamine-inspired memory router cuts token costs 80% and improves multi-hop reasoning

March 15, 20268 min

Overview

Production Readiness

0.7

Novelty Score

0.6

Cost Impact Score

0.8

Citation Count

0

Authors

Yuru Song, Qi Xin

Links

Abstract / PDF

Why It Matters For Business

If you run agents that interact over long noisy sessions, gating memory updates by surprise+utility can cut API costs dramatically and improve complex reasoning, at the cost of tuning for single-fact recall.

Summary TLDR

D-MEM is a bio-inspired memory system for autonomous LLM agents that routes each user turn through a lightweight Critic Router. The router scores semantic "surprise" and long-term utility to either skip, cache, or trigger a full knowledge-graph evolution. On a noisy 75% noise variant of LoCoMo, D-MEM reduces API token use by ~80%, outperforms synchronous baselines on multi-hop and adversarial QA, but trades off single-hop recall unless thresholds are adjusted. The authors open-source the implementation.

Problem Statement

Existing evolving agent memories apply heavy update logic to every turn, causing O(N^2) write costs, massive API token use, context pollution, and slow runtime under real noisy conversations. The problem: keep the benefits of dynamic, evolving memory (conflict resolution, multi-hop reasoning) while avoiding the high computational and token cost of evolving on every input.

Main Contribution

D-MEM architecture: a fast/slow Critic Router that gates memory updates using a Reward Prediction Error analogue.

Agentic RPE formulation: bounded multiplicative gate combining semantic surprise and long-term utility to avoid noisy false positives.

LoCoMo-Noise benchmark: a controlled noise-injection protocol (ρ = 0.75) for testing long-term memory under conversational noise.

Zero-cost retrieval augmentations: hybrid BM25 + dense retrieval with Reciprocal Rank Fusion and a Shadow Buffer fallback to protect against skipped-turn "amnesia".

Key Findings

D-MEM cuts API token consumption by about 80% compared to a synchronous evolving-memory baseline.

NumbersTotal tokens: A-MEM 1,648K → D-MEM 319K (−80%)

D-MEM substantially improves multi-hop reasoning under noisy dialogue.

NumbersMulti-hop F1 on noisy LoCoMo: D-MEM 0.412 vs A-MEM 0.365 (+0.047)

Aggressive utility-based skipping reduces single-hop recall versus synchronous systems.

NumbersSingle-hop F1 (clean LoCoMo): D-MEM 21.6% vs A-MEM 44.7% (−23.1 pp)

Real turns were skipped more often than LLM-generated noise under the current configuration.

NumbersSkip rates: real turns 53.9% vs injected noise 43.2%

Results

Total Tokens (LoCoMo-Noise, ρ=0.75)

Value319K

BaselineA-MEM 1,648K

Overall F1 (LoCoMo-Noise, ρ=0.75)

Value0.369

BaselineA-MEM 0.336

Multi-hop F1 (clean LoCoMo)

Value42.7%

BaselineA-MEM 27.0%

Single-hop F1 (clean LoCoMo)

Value21.6%

BaselineA-MEM 44.7%

Skip Rate (routing)

Value41.1% (final v4)

Baselinevaries across ablations

Who Should Care

What To Try In 7 Days

Add a lightweight utility classifier to tag turns as Transient/Short-Term/Persistent.

Implement a simple SKIP/CONSTRUCT/FULL_EVOLUTION routing with θ_low=0.3, θ_high=0.7 and measure token use.

Parallelize a BM25 sparse index with your vector store and fuse results via RRF to recover rare entities.

Agent Features

Memory

  • O(1) Short-Term Memory buffer for routine facts
  • Sparse O(N) deep evolution for paradigm shifts
  • O(1) Shadow Buffer (FIFO) for skipped-turn fallbacks

Planning

  • Selective full memory evolution for high-RPE events
  • Deferred linkage in CONSTRUCT_ONLY tier

Tool Use

  • Lightweight LLM call for Utility classification (JSON schema)
  • BM25 + vector retrieval hybrid

Frameworks

  • BM25 sparse index
  • Reciprocal Rank Fusion
  • Vector embedding index

Is Agentic

true

Architectures

  • Fast/Slow routing (Critic Router)
  • Evolving knowledge graph (long-term memory)
  • Short-term STM buffer and Shadow Buffer

Optimization Features

Token Efficiency

  • Selective routing reduces API tokens by ~80%
  • Shadow Buffer avoids expensive re-evolutions

System Optimization

  • Converts O(N^2) continuous evolution into rare O(N) events
  • Cold-start override to avoid early false positives

Inference Optimization

  • Per-turn compute gating via Critic Router
  • Avoids full evolution for low-utility turns

Reproducibility

Code Available

Open Source Status

  • yes

Risks & Boundaries

Limitations

  • LoCoMo-Noise uses synthetic LLM-generated noise with a fixed 40/30/30 mix, which may not match real user noise distributions.
  • Current utility classifier requires per-turn LLM calls, adding some overhead that must be distilled for zero-cost deployments.
  • Aggressive θ_low settings can over-prune real, low-complexity facts and hurt single-hop recall.

When Not To Use

  • When single-turn exact fact lookup is the dominant task and any single-hop miss is unacceptable.
  • When you cannot afford even the lightweight per-turn utility LLM call and have no plan to distill it.

Failure Modes

  • Calibration asymmetry: real turns skipped more than synthetic noise, causing lost facts.
  • Over-pruning during cold-start if warmup override is misconfigured.
  • Utility classifier false positives/negatives leading to unnecessary full evolutions or missed updates.

Core Entities

Models

  • D-MEM (this paper)
  • GPT-4o-mini (backbone used for LLM calls)

Metrics

  • F1
  • BLEU-1
  • Total Tokens
  • Skip Rate

Datasets

  • LoCoMo-Noise (constructed in this paper)

Benchmarks

  • LoCoMo-Noise

Context Entities

Models

  • A-MEM
  • MemGPT
  • MemoryBank
  • Full Context upper bound

Metrics

  • F1
  • BLEU-1

Datasets

  • LoCoMo (original dataset)

Benchmarks

  • LoCoMo