Add a semantic timeline and durative summaries so agents recall events at the right time

January 12, 20267 min

Overview

Decision SnapshotReady For Pilot

The method is practical: it builds a temporal graph and monthly summaries, uses off-the-shelf parsers and embeddings, and shows consistent gains on two public benchmarks, though experiments used single-run settings.

Citations0

Evidence Strength0.80

Confidence0.85

Risk Signals9

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 3/3

Reproducibility

Status: Partial assets available

Open source: Unknown

At A Glance

Cost impact: 40%

Production readiness: 70%

Novelty: 65%

Authors

Miao Su, Yucan Guo, Zhongni Hou, Long Bai, Zixuan Li, Yufei Zhang, Guojun Yin, Wei Lin, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng

Links

Abstract / PDF / Data

Why It Matters For Business

TSM makes assistants recall facts that happened when they actually happened, improving time-sensitive answers and multi-session personalization—this can reduce wrong or stale recommendations in customer support and personal assistants.

Who Should Care

Summary TLDR

This paper introduces Temporal Semantic Memory (TSM), a memory system that records when events actually happen (semantic time) and consolidates related events into durative summaries. TSM builds a Temporal Knowledge Graph (TKG) for event timestamps, clusters events by time into monthly topics/personas, and reranks retrieval results to match a query's time intent. On LONGMEMEVAL and LOCOMO, TSM improves accuracy over strong memory baselines (e.g., +12.2 percentage points vs A‑MEM on LONGMEMEVAL_S) and helps multi-session and temporal reasoning.

Problem Statement

Existing agent memories use dialogue timestamps or isolated event entries. That causes two problems: events get stored under the wrong time, and continuous experiences get split into point records. Agents then fail to retrieve temporally coherent, duration-aware context for time-sensitive or multi-session queries.

Main Contribution

Temporal Semantic Memory (TSM): organizes memory by event time, not chat time, to ground retrieval in actual occurrence intervals.

Durative memory: clusters temporally contiguous and semantically related episodic facts into monthly topic and persona summaries.

Key Findings

TSM raises overall QA accuracy on LONGMEMEVAL_S to 74.80%

NumbersTSM 74.80% vs A-MEM 62.60% (+12.20 pp)

Practical UseExpect about a 12 pp accuracy gain on long-term memory QA tasks versus a strong graph memory baseline when queries require temporal grounding.

Evidence RefTable 1; Main Results

Large gains on time-sensitive questions

NumbersTemporal category +22.56 pp (reported on LONGMEMEVAL_S)

Practical UseIf your application must answer when/over-which-period questions, adding semantic-time grounding substantially reduces time-misaligned retrieval errors.

Evidence RefSection 4.2; Table 1

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Accuracy74.80%A-MEM 62.60%+12.20%LONGMEMEVAL_S (GPT-4o-mini)Table 1 shows TSM 74.80% vs A-MEM 62.60%Table 1
Accuracy76.69%Naive RAG 63.64%+13.05%LOCOMO (GPT-4o-mini)Table 2 reports TSM 76.69% vs Naive RAG 63.64%Table 2

What To Try In 7 Days

Parse user queries for time expressions with spaCy and test retrieval filtered by that window.

Index events with a small Temporal Knowledge Graph (valid_time/invalid_time fields).

Cluster recent events monthly and generate short topic/persona summaries for retrieval trials on a small user cohort.

Agent Features

Memory
Episodic memory (time-grounded facts)Durative memory (consolidated, lasting summaries)Hierarchical update: online graph + sleep-time consolidation
Tool Use
spaCy for time parsingembedding models for dense retrieval
Frameworks
TSM (Temporal Semantic Memory)
Is Agentic

Yes

Architectures
Temporal Knowledge Graph (TKG)Durative memory summaries (monthly topics/personas)

Optimization Features

System Optimization
Separate lightweight online graph updates from expensive summary consolidation
Inference Optimization
Top-K retrieval (Top-K=25)Sleep-time consolidation to reduce online cost

Reproducibility

Code AvailableNo
Data AvailableYes
Open Source StatusUnknown
LicenseUnknown

Data URLs

LONGMEMEVAL (public benchmark)LOCOMO (public benchmark)

Risks & Boundaries

Limitations

Uses fixed monthly granularity for durative summaries; may miss finer or coarser temporal patterns.

Focuses on personalization; not evaluated for procedural or shared multi-agent memory.

When Not To Use

When the whole conversation fits the model context window (full-text performs better).

When compute or maintenance budget cannot support periodic consolidation or embedding storage.

Failure Modes

Incorrect time parsing leads to wrong temporal filters and missed evidence.

Over-consolidation can drop fine-grained facts needed for some queries.

Core Entities

Models

GPT-4o-miniQwen3-30B-A3B-Instruct-2507

Metrics

Accuracy

Datasets

LONGMEMEVALLOCOMO

Benchmarks

LongMemEvalLOCOMO

Context Entities

Models

A-MEMMem0Mem0 gZepMemoryOSLangMemNaive RAG