A-MEM: LLM agents that build and evolve a Zettelkasten-style linked memory

February 17, 20258 min

Overview

Decision SnapshotNeeds Validation

A-MEM uses embeddings to shortlist candidates and an LLM to decide links/updates; the idea is simple to implement but its quality depends on the base LLM and prompt design, and the results are shown across multiple models and datasets.

Citations6

Evidence Strength0.60

Confidence0.85

Risk Signals11

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 5/5

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 80%

Production readiness: 70%

Novelty: 60%

Authors

Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, Yongfeng Zhang

Links

Abstract / PDF / Code / Data

Why It Matters For Business

A-MEM cuts token and inference cost by ~85–93% per memory while improving multi-session reasoning, making long-term conversational agents materially cheaper and more capable to run at scale.

Who Should Care

Summary TLDR

A-MEM is a memory layer for LLM agents that creates structured "notes" for each interaction (content, LLM-generated keywords, tags, contextual description and embedding), uses embedding-based nearest neighbors to shortlist candidates, and then prompts an LLM to decide links and to update existing notes. On long multi-session QA datasets (LoCoMo, DialSim) A-MEM improves multi-hop reasoning scores and reduces token cost dramatically by using selective top-k retrieval and LLM-driven link/evolution steps. Code and production repo are published.

Problem Statement

Existing memory systems for LLM agents use fixed schemas and preset write/retrieve rules, so they struggle to form new organizational patterns or evolve knowledge over long, open-ended interactions. The paper proposes a dynamic, agent-driven memory that both links new items and updates old ones automatically.

Main Contribution

A-MEM: an agentic memory system that constructs structured notes (content, keywords, tags, contextual description, embedding) and autonomously links and evolves memories.

Two core modules: Link Generation (use embeddings + LLM to decide connections) and Memory Evolution (update contexts/tags of existing notes when new related memories arrive).

Key Findings

A-MEM improves DialSim QA accuracy over baselines.

NumbersDialSim F1: A-MEM 3.45 vs LoCoMo 2.55 (+35%) vs MemGPT 1.18 (+192%)

Practical UseIf you run long multi-party dialogue QA, replacing static memory layers with A-MEM can materially raise answer accuracy on evaluated datasets.

Evidence RefTable 2; Section 4.3

A-MEM greatly boosts multi-hop reasoning for GPT-based models.

NumbersGPT-4o-mini Multi-Hop ROUGE-L: A-MEM 44.27 vs LoCoMo 18.09 (>2×)

Practical UseUse A-MEM when questions require synthesizing information across sessions—it better assembles long-range evidence than simple context-passing baselines.

Evidence RefA.3 comparison results; main text Section 4.3

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
DialSim F1A-MEM 3.45LoCoMo 2.55; MemGPT 1.18+35% vs LoCoMo; +192% vs MemGPTDialSimTable 2; Section 4.3Table 2
Multi-Hop ROUGE-L (GPT-4o-mini)A-MEM 44.27LoCoMo 18.09>2×LoCoMo (Multi-Hop)A.3 and Section 4.3A.3

What To Try In 7 Days

Add a structured note layer: store content + LLM-generated keywords, tags, context, and embeddings.

Implement top-k dense retrieval (start k=10) and prompt an LLM to decide which retrieved items to link.

Run an ablation: compare current memory layer vs A-MEM on a held-out multi-session QA set to measure multi-hop gains and token savings.

Agent Features

Memory
Note construction (content, timestamp, keywords, tags, context, embedding)Link generation (LLM judgment over top-k neighbors)Memory evolution (update neighbor contexts/tags)
Planning
Dynamic link generation to shape memory graph
Tool Use
LLMs to generate keywords/tags/contextDense embedding encoder for similarity search
Frameworks
Zettelkasten method (atomic notes + flexible linking)
Is Agentic

Yes

Architectures
Zettelkasten-inspired note graph (atomic notes + boxes)Embedding-based index + LLM decision layer
Collaboration
Memory boxes allow a note to belong to multiple linked groups

Optimization Features

Token Efficiency
Selective top-k retrieval reduces tokens to ~1.2k per operationTuned k per task balances context richness and noise
System Optimization
Local hosting options (Ollama + LiteLLM) for faster, cheaper runs

Reproducibility

Code AvailableYes
Data AvailableYes
Open Source StatusPartial
LicenseUnknown

Data URLs

LoCoMo (see arXiv:2402.17753)DialSim (see arXiv:2406.13144)

Risks & Boundaries

Limitations

Performance depends on the underlying LLM quality; different LLMs produce different contexts/links.

Current implementation is text-only; multimodal memories (images/audio) are not supported yet.

When Not To Use

For one-off or very short interactions where long-term structure gives no benefit.

When strict privacy or compliance prevents storing or enriching user interactions without additional safeguards.

Failure Modes

Incorrect links: LLM may propose spurious connections that mislead downstream reasoning.

Drift: evolving contexts may accumulate noise or conflate distinct facts over time.

Core Entities

Models

GPT-4o-miniGPT-4oDeepSeek-R1-32BClaude 3.0 HaikuClaude 3.5 HaikuQwen2.5 (1.5b, 3b)Llama 3.2 (1b, 3b)

Metrics

F1BLEU-1ROUGE-LROUGE-2METEORSBERT Similaritytoken usageretrieval time

Datasets

LoCoMoDialSim

Benchmarks

Long-term conversational QA (LoCoMo, DialSim)