Jointly erase private facts from an LLM agent's weights and persistent memory to stop recontamination

February 6, 20267 min

Overview

Production Readiness

0.6

Novelty Score

0.6

Cost Impact Score

0.5

Citation Count

0

Authors

Bin Wang, Fan Wang, Pingping Wang, Jinyu Cong, Yang Yu, Yilong Yin, Zhongyi Han, Benzheng Wei

Links

Abstract / PDF

Why It Matters For Business

If your product uses agents with persistent memory, forgetting must remove data from both memory stores and model weights; otherwise retrieval can re-expose deleted facts and cause re-encoding, creating compliance and reputational risk.

Summary TLDR

This paper defines "agentic unlearning": removing specified information from both a memory-augmented LLM's parameters (weights) and its persistent external memory to stop a retrieval→generation→rewrite feedback loop called backflow. The authors propose Synchronized Backflow Unlearning (SBU): a dual-pathway protocol that first blocks and prunes dependent memory artifacts, then applies an entropy-regularized parameter update (KL-to-random) so the model becomes uncertain on forgotten items. On medical QA tests SBU raises privacy metrics (MIA Score) by ~24.8% while keeping test accuracy >90% on evaluated benchmarks. The system enforces a blocklist, dependency graph with reference counts, hybrid(

Problem Statement

Memory-augmented LLM agents store sensitive info in two places: explicit external memory and implicit model parameters. Deleting only parameters or only memory can fail because retrieval can re-expose forgotten facts and cause the model to re-encode them (backflow). Existing unlearning methods target stateless models and do not prevent this cross-pathway recontamination.

Main Contribution

Formalize agentic unlearning and identify parameter-memory backflow as the core failure mode for memory-augmented agents.

Introduce Synchronized Backflow Unlearning (SBU), a dual-pathway protocol that (1) dependency-prunes external memory and (2) aligns model outputs on forget queries to a high-entropy random prior.

Empirically show SBU reduces membership leakage and memory exposure on medical QA benchmarks while preserving test/generalization accuracy.

Key Findings

SBU improves privacy vs. strong baselines on MedQA (QF=100).

NumbersMIA Score 0.8953 vs baseline 0.7167 (+24.8% rel)

SBU preserves downstream utility while forgetting.

NumbersTest acc 92.50% and Gen. 90.50% (MedQA, QF=100)

Memory pathway removes explicit traces in the index.

NumbersMemory forget accuracy drops 78% → 14% (MedQA, QF=100)

SBU scales to larger forget sets with maintained privacy.

NumbersMIA Score up to 0.996 on MedMCQA (QF=1000)

Results

Accuracy

Value92.50% ± 2.12

BaselineOriginal/Sequential LoRA ~88.67%

MedQA generalization (Gen.)

Value90.50% ± 0.71

BaselineBaseline ~87.00%

MIA Score (higher=better privacy)

Value0.8953 ± 0.0181

BaselineSequential LoRA 0.7167 ± 0.0081

Accuracy

Value14.0% (after unlearning)

Baseline78.0% (before)

MIA Score at scale

Value0.9960 ± 0.0020

BaselineBaseline ~0.9020

Who Should Care

What To Try In 7 Days

Add a persistent blocklist and simple dependency graph to your memory store to tag and traverse derived artifacts.

Implement a retrieval filter that checks the blocklist before returning results; rebuild the index periodically when blocked count grows.

Prototype a KL-to-random parameter-step: train model outputs on forget queries toward a random-like prior while preserving retain-set loss in mixed batches.

Agent Features

Memory

  • dependency graph (M,S,R,K)
  • episodic / semantic / reflection layers
  • reference counting for provenance

Tool Use

  • vector store
  • hybrid search (semantic + keyword)
  • blocklist
  • tamper-evident audit log

Frameworks

  • Synchronized Backflow Unlearning (SBU)

Is Agentic

true

Architectures

  • memory-augmented LLM agent
  • retrieval-augmented agent

Optimization Features

Infra Optimization

  • Reported lower GPU memory usage vs baselines in experiments

Model Optimization

  • Entropy-regularized KL-to-random alignment on forget queries

System Optimization

  • Periodic vector-index rebuild when |B| > τ to prevent stale vectors
  • O(k·r) filtering overhead for topk results

Training Optimization

  • Mixed-batch trainer combining retain and forget samples
  • Lambda (λF) controls forgetting-utility tradeoff

Inference Optimization

  • Blocklist enforced at retrieval time to filter results

Reproducibility

Data Urls

  • MedQA, MedMCQA, MedReason (public datasets referenced in paper)

Data Available

Open Source Status

  • partial

Risks & Boundaries

Limitations

  • Dependency tracking may not capture cross-agent flows in shared knowledge graphs.
  • Index rebuilds and graph cleanup add operational cost and complexity.
  • Experiments use public medical QA datasets, not real patient records or multi-agent production deployments.

When Not To Use

  • Stateless models without external memory (no retrieval pathway).
  • Environments with many collaborating agents sharing a global KG (cross-agent flows unsupported).
  • When operational cost of periodic index rebuilds is prohibitive at massive scale.

Failure Modes

  • Over-unlearning if λF is too large, degrading task accuracy (paper reports test drop for large λF).
  • Incomplete deletion if dependency graph is incomplete or derived artifacts are missed.
  • Adversarial queries or undeleted proxies could re-expose forgotten facts if blocklist misses identifiers.

Core Entities

Models

  • II-Medical-8B
  • Qwen3-8B
  • text-embedding-ada-002 (OpenAI)

Metrics

  • Accuracy
  • MIA AUC
  • MIA Score

Datasets

  • MedMCQA
  • MedQA
  • MedReason

Benchmarks

  • MedQA evaluation
  • MedMCQA evaluation
  • MedReason evaluation