Overview
Production Readiness
0.6
Novelty Score
0.6
Cost Impact Score
0.5
Citation Count
1
Why It Matters For Business
Text2Mem makes memory commands predictable and auditable. That reduces bugs from inconsistent agent behavior, improves portability across memory backends, and makes long-running agent behavior testable and repeatable.
Summary TLDR
Text2Mem defines a compact JSON-based language and an execution pipeline that converts natural-language memory instructions into validated, typed operations. It standardizes 12 memory verbs (encode, update, promote, demote, merge, split, lock, expire, label, delete, retrieve, summarize), a 5-field schema, and a validator-parser-adapter pathway. A companion benchmark (Text2Mem Bench) separates planning (NL→schema) from execution (schema→SQL/backend effects) so systems can be measured for both correctness and real effects.
Problem Statement
Current agent memory systems expose inconsistent, ad-hoc commands. Natural-language memory requests are ambiguous about scope, action, and lifecycle. This causes unpredictable behavior, poor portability across systems, and hard-to-reproduce experiments.
Main Contribution
A verb-centered operation language (Text2Mem) with twelve mutually exclusive operations covering encoding, storage, and retrieval.
A compact, schema-based JSON contract (five backbone fields: stage, op, target, args, meta) plus a validator → parser → adapter pipeline that enforces safety and determinism before execution.
A reference SQL prototype backend and adapter design to map typed operations to real frameworks, and a planned Text2Mem Bench to measure both schema planning and execution effects.
Key Findings
Text2Mem defines a fixed inventory of twelve memory operations covering encode, storage, and retrieval.
Every operation is a typed JSON object with a five-field backbone: stage, op, target, args, meta.
Benchmark separates planning (NL→schema) and execution (schema→effects) and measures both string-match and effect correctness.
Results
Operation inventory
Schema backbone
Evaluation metrics
Who Should Care
What To Try In 7 Days
Map a small set of your agent memory flows to the Text2Mem verbs and enforce the five-field schema for incoming commands.
Run the SQL prototype for one workflow (encode → retrieve → promote) to confirm expected DB effects and logs.
Add validator checks for destructive actions (global writes, hard deletes) and require confirmation/dry-run flags.
Agent Features
Memory
- long-term memory lifecycle controls
- priority and governance (promote/demote/lock/expire)
Planning
- schema generation from natural language
- multi-step workflow planning (schema lists)
Tool Use
- LLM summarization
- embedding services
Frameworks
- validator-parser-adapter
- SQL prototype backend
- adapters for MemOS/mem0/Letta
Is Agentic
true
Architectures
- memory operating layer
- schema-driven adapter
Collaboration
- auditable workflows with actor/meta fields
Optimization Features
System Optimization
- separation of planning and execution to reduce ambiguity
- schema validation moves safety checks earlier
Reproducibility
Open Source Status
- unknown
Risks & Boundaries
Limitations
- No released implementation or evaluation results in this paper; benchmark results are promised later.
- Relies on LLM services for encoding and summarization, which shifts cost and variability to external models.
- Adapter correctness depends on backend capabilities; exact behavior requires careful adapter engineering.
When Not To Use
- When memory needs are simple and ephemeral (no governance or lifecycle controls).
- When you cannot run or afford LLM-based services for encoding/summarization.
- When your backend cannot implement required adapter semantics (locks, priority, lineage).
Failure Modes
- Incorrect schema generation from ambiguous natural language leading to wrong actions.
- Adapter mismatch where backend lacks a semantic equivalent of a verb, producing inconsistent effects.
- Overly strict schema causing harmless user intents to be rejected or require frequent confirmations.
Core Entities
Metrics
- SMA
- ESR
- EMR
Datasets
- Text2Mem Bench
Benchmarks
- Text2Mem Bench

