Overview
Production Readiness
0.7
Novelty Score
0.65
Cost Impact Score
0.4
Citation Count
0
Why It Matters For Business
TSM makes assistants recall facts that happened when they actually happened, improving time-sensitive answers and multi-session personalization—this can reduce wrong or stale recommendations in customer support and personal assistants.
Summary TLDR
This paper introduces Temporal Semantic Memory (TSM), a memory system that records when events actually happen (semantic time) and consolidates related events into durative summaries. TSM builds a Temporal Knowledge Graph (TKG) for event timestamps, clusters events by time into monthly topics/personas, and reranks retrieval results to match a query's time intent. On LONGMEMEVAL and LOCOMO, TSM improves accuracy over strong memory baselines (e.g., +12.2 percentage points vs A‑MEM on LONGMEMEVAL_S) and helps multi-session and temporal reasoning.
Problem Statement
Existing agent memories use dialogue timestamps or isolated event entries. That causes two problems: events get stored under the wrong time, and continuous experiences get split into point records. Agents then fail to retrieve temporally coherent, duration-aware context for time-sensitive or multi-session queries.
Main Contribution
Temporal Semantic Memory (TSM): organizes memory by event time, not chat time, to ground retrieval in actual occurrence intervals.
Durative memory: clusters temporally contiguous and semantically related episodic facts into monthly topic and persona summaries.
Semantic-time retrieval: parses a query's intended time window, filters and reranks candidates using the Temporal Knowledge Graph (TKG) to enforce time validity.
Efficient maintenance: lightweight online updates to the TKG plus periodic ‘sleep-time’ consolidation for summaries.
Key Findings
TSM raises overall QA accuracy on LONGMEMEVAL_S to 74.80%
Large gains on time-sensitive questions
Durative summaries improve multi-session reasoning
Temporal modeling and summaries both matter (ablation)
Results
Accuracy
Accuracy
Accuracy
Who Should Care
What To Try In 7 Days
Parse user queries for time expressions with spaCy and test retrieval filtered by that window.
Index events with a small Temporal Knowledge Graph (valid_time/invalid_time fields).
Cluster recent events monthly and generate short topic/persona summaries for retrieval trials on a small user cohort.
Agent Features
Memory
- Episodic memory (time-grounded facts)
- Durative memory (consolidated, lasting summaries)
- Hierarchical update: online graph + sleep-time consolidation
Tool Use
- spaCy for time parsing
- embedding models for dense retrieval
Frameworks
- TSM (Temporal Semantic Memory)
Is Agentic
true
Architectures
- Temporal Knowledge Graph (TKG)
- Durative memory summaries (monthly topics/personas)
Optimization Features
System Optimization
- Separate lightweight online graph updates from expensive summary consolidation
Inference Optimization
- Top-K retrieval (Top-K=25)
- Sleep-time consolidation to reduce online cost
Reproducibility
Data Urls
- LONGMEMEVAL (public benchmark)
- LOCOMO (public benchmark)
Data Available
Open Source Status
- unknown
Risks & Boundaries
Limitations
- Uses fixed monthly granularity for durative summaries; may miss finer or coarser temporal patterns.
- Focuses on personalization; not evaluated for procedural or shared multi-agent memory.
- Evaluation uses single-run experiments and two benchmarks, which may limit generality.
When Not To Use
- When the whole conversation fits the model context window (full-text performs better).
- When compute or maintenance budget cannot support periodic consolidation or embedding storage.
- When you need procedural or skill memory rather than user factual/persona memory.
Failure Modes
- Incorrect time parsing leads to wrong temporal filters and missed evidence.
- Over-consolidation can drop fine-grained facts needed for some queries.
- Noisy clustering may mix unrelated events into durative summaries and mislead retrieval.
Core Entities
Models
- GPT-4o-mini
- Qwen3-30B-A3B-Instruct-2507
Metrics
- Accuracy
Datasets
- LONGMEMEVAL
- LOCOMO
Benchmarks
- LongMemEval
- LOCOMO
Context Entities
Models
- A-MEM
- Mem0
- Mem0 g
- Zep
- MemoryOS
- LangMem
- Naive RAG

