Overview
SwiftMem demonstrates clear engineering gains in latency and practical mechanisms (time and tag indexing). Evidence comes from standard benchmarks and ablations, but full system code is not yet released and evaluation relies on LLM-as-judge metrics.
Citations0
Evidence Strength0.80
Confidence0.80
Risk Signals10
Trust Signals
Findings with numeric evidence: 6/6
Findings with evidence refs: 6/6
Results with explicit delta: 6/6
Reproducibility
Status: Partial assets available
Open source: Partial
At A Glance
Cost impact: 80%
Production readiness: 70%
Novelty: 70%
Why It Matters For Business
Cut memory search latency into the low tens of milliseconds so memory-augmented agents respond in real time while lowering infrastructure and throughput costs.
Who Should Care
Summary TLDR
SwiftMem replaces brute-force memory scans with a three-tier, query-aware index (temporal index, semantic DAG-tag index, and embedding index with co-consolidation). On LoCoMo and LongMemEval, SwiftMem reduces search latency to ~11ms (47× faster vs. some baselines), keeps competitive semantic accuracy (overall LLM score ~0.70 on LoCoMo), and raises lexical precision (BLEU-1 0.467). The system focuses each query on a small subset of memory using time and topic signals, making real-time memory-augmented agents practical.
Problem Statement
Existing agent memory systems scan the entire memory for every query (O(N_mem)). As conversation history grows, search latency balloons and real-time agent responses become impractical. SwiftMem targets this scalability and latency bottleneck by indexing memory by time and semantic tags so queries search only relevant subsets.
Main Contribution
Three-tier query-aware indexing: temporal index, semantic DAG-Tag index, and embedding index with co-consolidation.
Temporal index enabling binary-searchable user timelines for O(log N_mem) time-range queries.
Key Findings
Search latency reduced to ~11 ms per query on LoCoMo.
Measured up to 47× faster search compared to state-of-the-art memory frameworks.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Search latency (SwiftMem) | 11 ms | Nemori 835 ms; Zep 522 ms | ≈47× faster vs. 522ms example | LoCoMo | Table 2 reports 11 ms search latency for SwiftMem | Table 2 |
| Total end-to-end latency (SwiftMem) | 1,289 ms | FullContext 5,806 ms; RAG-4096 2,884 ms | ≈4.5× faster than FullContext | LoCoMo | Table 2 shows total latency 1,289 ms | Table 2 |
What To Try In 7 Days
Add a lightweight temporal index for user timelines to speed time-based queries.
Tag episodes with a small set of topic tags and route queries to those tags first.
Batch a periodic embedding-tag consolidation to colocate semantically related vectors and measure latency change.
Agent Features
Memory
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Optimization Features
Infra Optimization
System Optimization
Inference Optimization
Reproducibility
Risks & Boundaries
Limitations
Relies on LLM-based tag generation; tag errors or inconsistent tags can harm routing.
Temporal index helps only when queries include or allow reliable time cues.
When Not To Use
When you must guarantee the absolute highest semantic accuracy—FullContext scored higher on some LLM metrics.
When conversational memory is tiny and exhaustive search cost is negligible.
Failure Modes
Poor tag mapping causes relevant memories to be excluded from the candidate set.
Over-aggressive temporal filtering could omit relevant episodes for queries with vague time cues.

