A small, formal language that turns vague memory commands into safe, verifiable operations for LLM agents

September 14, 20257 min

Overview

Production Readiness

0.6

Novelty Score

0.6

Cost Impact Score

0.5

Citation Count

1

Authors

Yi Wang, Lihai Yang, Boyu Chen, Gongyi Zou, Kerun Xu, Bo Tang, Feiyu Xiong, Siheng Chen, Zhiyu Li

Links

Abstract / PDF

Why It Matters For Business

Text2Mem makes memory commands predictable and auditable. That reduces bugs from inconsistent agent behavior, improves portability across memory backends, and makes long-running agent behavior testable and repeatable.

Summary TLDR

Text2Mem defines a compact JSON-based language and an execution pipeline that converts natural-language memory instructions into validated, typed operations. It standardizes 12 memory verbs (encode, update, promote, demote, merge, split, lock, expire, label, delete, retrieve, summarize), a 5-field schema, and a validator-parser-adapter pathway. A companion benchmark (Text2Mem Bench) separates planning (NL→schema) from execution (schema→SQL/backend effects) so systems can be measured for both correctness and real effects.

Problem Statement

Current agent memory systems expose inconsistent, ad-hoc commands. Natural-language memory requests are ambiguous about scope, action, and lifecycle. This causes unpredictable behavior, poor portability across systems, and hard-to-reproduce experiments.

Main Contribution

A verb-centered operation language (Text2Mem) with twelve mutually exclusive operations covering encoding, storage, and retrieval.

A compact, schema-based JSON contract (five backbone fields: stage, op, target, args, meta) plus a validator → parser → adapter pipeline that enforces safety and determinism before execution.

A reference SQL prototype backend and adapter design to map typed operations to real frameworks, and a planned Text2Mem Bench to measure both schema planning and execution effects.

Key Findings

Text2Mem defines a fixed inventory of twelve memory operations covering encode, storage, and retrieval.

Numbers12 operations (Table I; encoding/storage/retrieval split)

Every operation is a typed JSON object with a five-field backbone: stage, op, target, args, meta.

Numbers5 backbone fields (stage, op, target, args, meta)

Benchmark separates planning (NL→schema) and execution (schema→effects) and measures both string-match and effect correctness.

Numbers3 core metrics introduced: SMA, ESR, EMR

Results

Operation inventory

Value12 verbs covering encode/storage/retrieval

Schema backbone

Value5 fields (stage, op, target, args, meta)

Evaluation metrics

ValueSMA / ESR / EMR for planning + execution

Who Should Care

What To Try In 7 Days

Map a small set of your agent memory flows to the Text2Mem verbs and enforce the five-field schema for incoming commands.

Run the SQL prototype for one workflow (encode → retrieve → promote) to confirm expected DB effects and logs.

Add validator checks for destructive actions (global writes, hard deletes) and require confirmation/dry-run flags.

Agent Features

Memory

  • long-term memory lifecycle controls
  • priority and governance (promote/demote/lock/expire)

Planning

  • schema generation from natural language
  • multi-step workflow planning (schema lists)

Tool Use

  • LLM summarization
  • embedding services

Frameworks

  • validator-parser-adapter
  • SQL prototype backend
  • adapters for MemOS/mem0/Letta

Is Agentic

true

Architectures

  • memory operating layer
  • schema-driven adapter

Collaboration

  • auditable workflows with actor/meta fields

Optimization Features

System Optimization

  • separation of planning and execution to reduce ambiguity
  • schema validation moves safety checks earlier

Reproducibility

Open Source Status

  • unknown

Risks & Boundaries

Limitations

  • No released implementation or evaluation results in this paper; benchmark results are promised later.
  • Relies on LLM services for encoding and summarization, which shifts cost and variability to external models.
  • Adapter correctness depends on backend capabilities; exact behavior requires careful adapter engineering.

When Not To Use

  • When memory needs are simple and ephemeral (no governance or lifecycle controls).
  • When you cannot run or afford LLM-based services for encoding/summarization.
  • When your backend cannot implement required adapter semantics (locks, priority, lineage).

Failure Modes

  • Incorrect schema generation from ambiguous natural language leading to wrong actions.
  • Adapter mismatch where backend lacks a semantic equivalent of a verb, producing inconsistent effects.
  • Overly strict schema causing harmless user intents to be rejected or require frequent confirmations.

Core Entities

Metrics

  • SMA
  • ESR
  • EMR

Datasets

  • Text2Mem Bench

Benchmarks

  • Text2Mem Bench