Overview
The method is practical: it builds a temporal graph and monthly summaries, uses off-the-shelf parsers and embeddings, and shows consistent gains on two public benchmarks, though experiments used single-run settings.
Citations0
Evidence Strength0.80
Confidence0.85
Risk Signals9
Trust Signals
Findings with numeric evidence: 4/4
Findings with evidence refs: 4/4
Results with explicit delta: 3/3
Reproducibility
Status: Partial assets available
Open source: Unknown
At A Glance
Cost impact: 40%
Production readiness: 70%
Novelty: 65%
Why It Matters For Business
TSM makes assistants recall facts that happened when they actually happened, improving time-sensitive answers and multi-session personalization—this can reduce wrong or stale recommendations in customer support and personal assistants.
Who Should Care
Summary TLDR
This paper introduces Temporal Semantic Memory (TSM), a memory system that records when events actually happen (semantic time) and consolidates related events into durative summaries. TSM builds a Temporal Knowledge Graph (TKG) for event timestamps, clusters events by time into monthly topics/personas, and reranks retrieval results to match a query's time intent. On LONGMEMEVAL and LOCOMO, TSM improves accuracy over strong memory baselines (e.g., +12.2 percentage points vs A‑MEM on LONGMEMEVAL_S) and helps multi-session and temporal reasoning.
Problem Statement
Existing agent memories use dialogue timestamps or isolated event entries. That causes two problems: events get stored under the wrong time, and continuous experiences get split into point records. Agents then fail to retrieve temporally coherent, duration-aware context for time-sensitive or multi-session queries.
Main Contribution
Temporal Semantic Memory (TSM): organizes memory by event time, not chat time, to ground retrieval in actual occurrence intervals.
Durative memory: clusters temporally contiguous and semantically related episodic facts into monthly topic and persona summaries.
Key Findings
TSM raises overall QA accuracy on LONGMEMEVAL_S to 74.80%
Large gains on time-sensitive questions
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Accuracy | 74.80% | A-MEM 62.60% | +12.20% | LONGMEMEVAL_S (GPT-4o-mini) | Table 1 shows TSM 74.80% vs A-MEM 62.60% | Table 1 |
| Accuracy | 76.69% | Naive RAG 63.64% | +13.05% | LOCOMO (GPT-4o-mini) | Table 2 reports TSM 76.69% vs Naive RAG 63.64% | Table 2 |
What To Try In 7 Days
Parse user queries for time expressions with spaCy and test retrieval filtered by that window.
Index events with a small Temporal Knowledge Graph (valid_time/invalid_time fields).
Cluster recent events monthly and generate short topic/persona summaries for retrieval trials on a small user cohort.
Agent Features
Memory
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Optimization Features
System Optimization
Inference Optimization
Reproducibility
Data URLs
Risks & Boundaries
Limitations
Uses fixed monthly granularity for durative summaries; may miss finer or coarser temporal patterns.
Focuses on personalization; not evaluated for procedural or shared multi-agent memory.
When Not To Use
When the whole conversation fits the model context window (full-text performs better).
When compute or maintenance budget cannot support periodic consolidation or embedding storage.
Failure Modes
Incorrect time parsing leads to wrong temporal filters and missed evidence.
Over-consolidation can drop fine-grained facts needed for some queries.

