Add a semantic timeline and durative summaries so agents recall events at the right time

Overview

Decision SnapshotReady For Pilot

The method is practical: it builds a temporal graph and monthly summaries, uses off-the-shelf parsers and embeddings, and shows consistent gains on two public benchmarks, though experiments used single-run settings.

Citations0

Evidence Strength0.80

Confidence0.85

Risk Signals9

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 3/3

Reproducibility

Status: Partial assets available

Open source: Unknown

At A Glance

Cost impact: 40%

Production readiness: 70%

Novelty: 65%

Authors

Miao Su, Yucan Guo, Zhongni Hou, Long Bai, Zixuan Li, Yufei Zhang, Guojun Yin, Wei Lin, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng

Links

Abstract / PDF / Data

Why It Matters For Business

TSM makes assistants recall facts that happened when they actually happened, improving time-sensitive answers and multi-session personalization—this can reduce wrong or stale recommendations in customer support and personal assistants.

Who Should Care

CTO Product Manager ML Engineer Data Scientist Engineering Lead

Summary TLDR

This paper introduces Temporal Semantic Memory (TSM), a memory system that records when events actually happen (semantic time) and consolidates related events into durative summaries. TSM builds a Temporal Knowledge Graph (TKG) for event timestamps, clusters events by time into monthly topics/personas, and reranks retrieval results to match a query's time intent. On LONGMEMEVAL and LOCOMO, TSM improves accuracy over strong memory baselines (e.g., +12.2 percentage points vs A‑MEM on LONGMEMEVAL_S) and helps multi-session and temporal reasoning.

Problem Statement

Existing agent memories use dialogue timestamps or isolated event entries. That causes two problems: events get stored under the wrong time, and continuous experiences get split into point records. Agents then fail to retrieve temporally coherent, duration-aware context for time-sensitive or multi-session queries.

Main Contribution

Temporal Semantic Memory (TSM): organizes memory by event time, not chat time, to ground retrieval in actual occurrence intervals.

Durative memory: clusters temporally contiguous and semantically related episodic facts into monthly topic and persona summaries.

Key Findings

TSM raises overall QA accuracy on LONGMEMEVAL_S to 74.80%

NumbersTSM 74.80% vs A-MEM 62.60% (+12.20 pp)

Practical UseExpect about a 12 pp accuracy gain on long-term memory QA tasks versus a strong graph memory baseline when queries require temporal grounding.

Evidence RefTable 1; Main Results

Large gains on time-sensitive questions

NumbersTemporal category +22.56 pp (reported on LONGMEMEVAL_S)

Practical UseIf your application must answer when/over-which-period questions, adding semantic-time grounding substantially reduces time-misaligned retrieval errors.

Evidence RefSection 4.2; Table 1

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
Accuracy	74.80%	A-MEM 62.60%	+12.20%	LONGMEMEVAL_S (GPT-4o-mini)	Table 1 shows TSM 74.80% vs A-MEM 62.60%	Table 1
Accuracy	76.69%	Naive RAG 63.64%	+13.05%	LOCOMO (GPT-4o-mini)	Table 2 reports TSM 76.69% vs Naive RAG 63.64%	Table 2

What To Try In 7 Days

Parse user queries for time expressions with spaCy and test retrieval filtered by that window.

Index events with a small Temporal Knowledge Graph (valid_time/invalid_time fields).

Cluster recent events monthly and generate short topic/persona summaries for retrieval trials on a small user cohort.

Agent Features

Memory

Episodic memory (time-grounded facts)Durative memory (consolidated, lasting summaries)Hierarchical update: online graph + sleep-time consolidation

Tool Use

spaCy for time parsingembedding models for dense retrieval

Frameworks

TSM (Temporal Semantic Memory)

Is Agentic

Yes

Architectures

Temporal Knowledge Graph (TKG)Durative memory summaries (monthly topics/personas)

Optimization Features

System Optimization

Separate lightweight online graph updates from expensive summary consolidation

Inference Optimization

Top-K retrieval (Top-K=25)Sleep-time consolidation to reduce online cost

Reproducibility

Code AvailableNo

Data AvailableYes

Open Source StatusUnknown

LicenseUnknown

Data URLs

LONGMEMEVAL (public benchmark)LOCOMO (public benchmark)

Risks & Boundaries

Limitations

Uses fixed monthly granularity for durative summaries; may miss finer or coarser temporal patterns.

Focuses on personalization; not evaluated for procedural or shared multi-agent memory.

When Not To Use

When the whole conversation fits the model context window (full-text performs better).

When compute or maintenance budget cannot support periodic consolidation or embedding storage.

Failure Modes

Incorrect time parsing leads to wrong temporal filters and missed evidence.

Over-consolidation can drop fine-grained facts needed for some queries.

Core Entities

Models

GPT-4o-miniQwen3-30B-A3B-Instruct-2507

Metrics

Accuracy

Datasets

LONGMEMEVALLOCOMO

Benchmarks

LongMemEvalLOCOMO

Context Entities

Models

A-MEMMem0Mem0 gZepMemoryOSLangMemNaive RAG

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

TSM raises overall QA accuracy on LONGMEMEVAL_S to 74.80%

Large gains on time-sensitive questions

Results

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Data URLs

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Benchmarks

Context Entities

Models

You May Also Want to Read

Survey of how LLMs become autonomous agents, the core architecture, and the research gaps to make them safe and practical.

Key finding

Agentic ROI: prioritize real user value, not raw model scores

Key finding

Hierarchical multi-agent research agent that compresses long context, routes subtasks to specialized tools, and self-corrects failures.

Key finding

Declarative agent spec plus a runtime that enforces safety, memory, and low-latency execution

Key finding

Jointly erase private facts from an LLM agent's weights and persistent memory to stop recontamination

Key finding