Add an editable triplet memory to LLMs via a read/write API and vector lookup

May 23, 20237 min

Overview

Production Readiness

0.3

Novelty Score

0.6

Cost Impact Score

0.4

Citation Count

6

Authors

Ali Modarressi, Ayyoob Imani, Mohsen Fayyaz, Hinrich Schütze

Links

Abstract / PDF

Why It Matters For Business

An external editable memory lets products keep facts up to date, audit what an LLM used to answer, and combine scattered facts without retraining the model.

Summary TLDR

RET-LLM is a concept for giving language models an external, editable read/write memory. The memory stores extracted facts as triplets <arg1, relation, arg2>, keeps vector embeddings for fuzzy lookup with LSH, and exposes a text-based memory API ([MEM_WRITE], [MEM_READ]). The authors finetune Alpaca-7B (with LoRA) on synthetic triplet tasks so the model learns to emit API calls. Qualitative examples show RET-LLM answering questions correctly where the base Alpaca model failed. No large-scale quantitative evaluation is provided yet.

Problem Statement

Large LLMs encode knowledge implicitly in parameters. They lack a dedicated, editable memory that can store, update, and aggregate facts across documents and time. This makes handling changing facts, aggregations, and explicit retrieval harder without retraining.

Main Contribution

Design of an external read/write memory that stores facts as triplets ⟨t1, relation, t2⟩ and keeps mean vector embeddings for each triplet field.

A simple text-based memory API (MEM_WRITE / MEM_READ) so an LLM can call memory via generated text and a controller.

A proof-of-concept finetuning recipe: train an instruction-tuned LLM (Alpaca-7B) with LoRA on synthetic triplet QA so it learns to generate memory API calls.

Use of LSH for fast approximate vector (fuzzy) lookup and aggregation of matching triplets.

Key Findings

In qualitative examples, RET-LLM produced correct answers while the base Alpaca-7B produced incorrect answers despite having the same contextual text.

The memory stores both text triplets and their mean vector embeddings, using LSH to return semantically similar entries when exact text matches are absent.

A finetuned LLM can learn to emit read/write API calls after training on synthetic triplet examples, enabling seamless user interaction through a controller.

Results

qualitative QA correctness

ValueRET-LLM correct vs Alpaca-7B incorrect on provided examples

BaselineAlpaca-7B zero-shot

finetuning resource

ValueLoRA on Alpaca-7B finetuned on single A6000 48GB GPU

Who Should Care

What To Try In 7 Days

Prototype a triplet extractor that writes simple ⟨entity,relation,entity⟩ rows from documents.

Store triplets with embeddings and use an off-the-shelf LSH index for fuzzy lookup.

Finetune a small instruction model (LoRA) on synthetic read/write examples so it emits MEM_READ/MEM_WRITE calls and test a few QA flows.

Agent Features

Memory

  • read-write
  • updatable
  • aggregatable across documents
  • interpretable (triplet rows)
  • scalable in design (claims; not empirically tested)

Tool Use

  • memory-API (text-based read/write)
  • LSH for vector lookup

Frameworks

  • Davidsonian-style triplet representation (<arg1, relation, arg2>)

Is Agentic

true

Architectures

  • LLM + external memory + controller

Optimization Features

Training Optimization

  • LoRA

Inference Optimization

  • LSH for fast approximate retrieval

Reproducibility

Open Source Status

  • unknown

Risks & Boundaries

Limitations

  • Only qualitative examples provided; no quantitative benchmarks or large-scale evaluation.
  • Finetuning and evaluation use a synthetic population dataset, not real-world corpora.
  • Triplet extraction quality is critical but not evaluated at scale.
  • Scalability claims lack empirical backing beyond the use of LSH.
  • No discussion of handling complex relations beyond 3-field triplets.

When Not To Use

  • High-stakes or safety-critical settings without rigorous evaluation.
  • Tasks requiring complex relational structures beyond simple triplets.
  • Scenarios where triplet extraction cannot be made reliable.

Failure Modes

  • Poor triplet extraction yields wrong or missing memory entries.
  • LSH misses semantically similar items, leading to empty query results.
  • Controller/LLM misgenerate API calls or misinterpret API responses.
  • Aggregation of many noisy triplets can produce incorrect combined answers.
  • Outdated embeddings if memory isn’t re-embedded after updates.

Core Entities

Models

  • Alpaca-7B

Datasets

  • synthetic triplet population (authors generated names, relations, orgs)

Context Entities

Models

  • MemLLM (follow-up work referenced)