Overview
Production Readiness
0.7
Novelty Score
0.7
Cost Impact Score
0.8
Citation Count
0
Why It Matters For Business
Cut memory search latency into the low tens of milliseconds so memory-augmented agents respond in real time while lowering infrastructure and throughput costs.
Summary TLDR
SwiftMem replaces brute-force memory scans with a three-tier, query-aware index (temporal index, semantic DAG-tag index, and embedding index with co-consolidation). On LoCoMo and LongMemEval, SwiftMem reduces search latency to ~11ms (47× faster vs. some baselines), keeps competitive semantic accuracy (overall LLM score ~0.70 on LoCoMo), and raises lexical precision (BLEU-1 0.467). The system focuses each query on a small subset of memory using time and topic signals, making real-time memory-augmented agents practical.
Problem Statement
Existing agent memory systems scan the entire memory for every query (O(N_mem)). As conversation history grows, search latency balloons and real-time agent responses become impractical. SwiftMem targets this scalability and latency bottleneck by indexing memory by time and semantic tags so queries search only relevant subsets.
Main Contribution
Three-tier query-aware indexing: temporal index, semantic DAG-Tag index, and embedding index with co-consolidation.
Temporal index enabling binary-searchable user timelines for O(log N_mem) time-range queries.
Semantic DAG-Tag routing that maps queries to small tag sets and expands them hierarchically to avoid full scans.
Embedding-tag co-consolidation that reorganizes embeddings by semantic clusters to improve cache locality and speed.
Key Findings
Search latency reduced to ~11 ms per query on LoCoMo.
Measured up to 47× faster search compared to state-of-the-art memory frameworks.
Maintains competitive semantic accuracy while improving lexical precision.
Scales stably: search latency stays below 15ms as dataset grows.
Temporal indexing reduces search latency by ~35% when temporal hints are present.
Tag-embedding co-consolidation preserves recall while improving judge score and latency.
Results
Search latency (SwiftMem)
Total end-to-end latency (SwiftMem)
Overall semantic quality (LLM Score)
Lexical precision (BLEU-1)
Temporal-index latency improvement
Co-consolidation effect (LLM judge & latency)
Who Should Care
What To Try In 7 Days
Add a lightweight temporal index for user timelines to speed time-based queries.
Tag episodes with a small set of topic tags and route queries to those tags first.
Batch a periodic embedding-tag consolidation to colocate semantically related vectors and measure latency change.
Agent Features
Memory
- episodic memory (timestamped episodes)
- semantic DAG-Tag index (hierarchical tags)
- embedding index with co-consolidation
Tool Use
- LLM-based tag generation (for semantic tags)
Frameworks
- LLM-as-Judge evaluation
Is Agentic
true
Architectures
- three-tier indexing (temporal, DAG-tag, embedding)
Optimization Features
Infra Optimization
- reduced search CPU/GPU workload from smaller candidate sets
System Optimization
- binary-searchable user timelines for O(log N_mem) queries
- hierarchical tag expansion to narrow search
Inference Optimization
- sub-linear retrieval via tag+temporal filtering
- improved cache locality from co-consolidation
Reproducibility
Data Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Relies on LLM-based tag generation; tag errors or inconsistent tags can harm routing.
- Temporal index helps only when queries include or allow reliable time cues.
- Evaluation uses LLM-as-judge which can introduce judge bias and may not replace human assessment.
- Source code not yet released, limiting immediate reproducibility.
When Not To Use
- When you must guarantee the absolute highest semantic accuracy—FullContext scored higher on some LLM metrics.
- When conversational memory is tiny and exhaustive search cost is negligible.
- If you cannot generate reliable semantic tags or timestamps for episodes.
Failure Modes
- Poor tag mapping causes relevant memories to be excluded from the candidate set.
- Over-aggressive temporal filtering could omit relevant episodes for queries with vague time cues.
- Consolidation may create transient layout changes that affect hot-path performance during reorganization.
Core Entities
Models
- GPT-4.1-mini (LLM-as-judge)
- GPT-4o-mini (evaluation)
Metrics
- LLM Score
- F1
- BLEU-1
- Search latency (ms)
- Total latency (ms)
Datasets
- LoCoMo
- LongMemEval
Benchmarks
- LoCoMo
- LongMemEval
Context Entities
Models
- Nemori
- LangMem
- Mem0
- Zep
- RAG-4096
- FullContext

