Train agents to skip redundant thoughts and past observations to cut token cost while keeping accuracy

Overview

Decision SnapshotNeeds Validation

The method is practical: cold-start SFT plus an RL phase yields measurable token and accuracy gains on five benchmarks, but it relies on synthetic data and environment-specific RL tuning.

Citations0

Evidence Strength0.70

Confidence0.80

Risk Signals10

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 2/4

Reproducibility

Status: Code + data available

Open source: Yes

At A Glance

Cost impact: 70%

Production readiness: 60%

Novelty: 60%

Authors

Yansong Ning, Jun Fang, Naiqiang Tan, Hao Liu

Links

Abstract / PDF / Code / Data

Why It Matters For Business

Omitting unnecessary internal reasoning and old tool outputs reduces API token costs and latency while keeping or improving task success, giving a better cost-performance trade-off for production agents.

Who Should Care

ML Engineer Product Manager Engineering Lead CTO

Summary TLDR

The paper studies which agent turns (thoughts and observations) really matter, and trains an LLM agent (Agent-Omit) that learns to omit unnecessary internal reasoning or prior tool responses. They synthesize cold-start omission data, then apply an omit-aware RL loop (dual sampling + omission reward + KL penalty). On five agent benchmarks Agent-Omit-8B (RL) keeps or improves Pass@1 accuracy vs strong baselines while lowering average token use. Trained agents omit about 3–4 turns, mostly in middle turns. Code and data are provided.

Problem Statement

Multi-turn LLM agents spend most tokens on internal thoughts and stacked observations. Not all turns matter equally. The problem is to learn when to omit thoughts or past observations to reduce tokens while keeping task accuracy across diverse agent environments.

Main Contribution

Turn-level analysis showing thought and observation token cost dominate agent context and that their utility varies by turn.

Agent-Omit framework: cold-start synthesis of single- and multi-turn omission samples plus omit-aware agentic RL (dual sampling, omission reward, KL penalty).

Key Findings

Thought and observation tokens dominate agent context.

NumbersThought 45.1% of tokens; Observation 52.2%; Actions 2.7% (WebShop, Qwen3-8B)

Practical UseFocus optimization on thoughts and past observations — reducing action tokens yields little benefit.

Evidence RefFig.2(a), Sec.3.1

Selective omission can cut tokens without hurting accuracy on many turns.

NumbersGrey regions in Fig.3 show token drop with no accuracy loss on intermediate turns (WebShop, Qwen3-8B)

Practical UseImplement turn-aware omission (not global compression) to save tokens while preserving task success.

Evidence RefFig.3, Sec.3.2

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
Token composition (WebShop, Qwen3-8B)	Thought 45.1%; Observation 52.2%; Actions 2.7%	—	—	WebShop	Fig.2(a)	Sec.3.1
Agent-Omit-8B-RL Pass@1	WebShop 23.57; TextCraft 87.00; BabyAI 84.36; SciWorld 18.45; DeepSearch 26.56	Various frontier LLM agents (Table 2)	Improves vs many frontier models on WebShop/TextCraft/BabyAI/SciWorld	Multiple (Table 2)	Table 2 main results	Sec.6.2

What To Try In 7 Days

Add a simple omit token/flag to agent outputs and target mid-turn omissions in a dev benchmark.

Synthesize 2–4K cold-start omission examples and fine-tune the agent to accept empty <think> or <omit tool response> tokens.

Implement a lightweight omit-aware reward (saved_tokens ratio with task-correctness gating) and run a few RL rollouts to tune the omission weight.

Agent Features

Memory

explicit omission of prior tool responseshierarchical single- and multi-turn omission handling

Planning

adaptive omission policyturn-level planning with omit decisionsdual sampling (full and partial trajectories)

Tool Use

search engine callsweb navigation actionsgame/environment actions

Frameworks

AgentGym-RLGRPO

Is Agentic

Yes

Architectures

Qwen3-8BQwen3-4BAgent-Omit-8B

Optimization Features

Token Efficiency

explicit omission reward proportional to saved tokensagents omit 3–4 turns on average

Model Optimization

full-parameter fine-tuning for omission behavior

System Optimization

SFT

Training Optimization

cold-start synthetic omission dataset (2–4K samples)omit-aware RL with dual samplingKL penalty to constrain policy shift

Inference Optimization

omit tool responses and empty thoughts to reduce context lengthomit-aware action formatting (<omit tool response> tokens)

Reproducibility

Code AvailableYes

Data AvailableYes

Open Source StatusYes

LicenseUnknown

Code URLs

https://github.com/usailhkust/Agent-Omit

Data URLs

https://github.com/usailhkust/Agent-Omit

Risks & Boundaries

Limitations

Omission harms initial and final turns; must be selective not global.

Relies on synthetic cold-start data and environment-specific RL; generalization to unseen environments is untested.

When Not To Use

Tasks where every past observation is safety-critical or legally required.

Single-turn tasks where omission brings no benefit.

Failure Modes

Removing a needed thought/observation and forcing the agent to generate extra recovery reasoning.

Reward hacking if omission reward is not gated by correctness (authors set R_omit=0 when R_task=0 to avoid this).

Core Entities

Models

Agent-Omit-8B-RLSFTAgent-Omit-4B-RLQwen3-8BQwen3-4BDeepSeek-R1-0528DeepSeek-V3.2OpenAI o3Qwen3-235B-A22B

Metrics

Pass@1Pass@8Average Tokens (Avg Tok)Token Reduction Ratio

Datasets

DeepSearchWebShopTextCraftBabyAISciWorld

Benchmarks

WebShopDeepSearchTextCraftBabyAISciWorld

Context Entities

Models

DeepSeek-R1-0528DeepSeek-V3.2OpenAI o4-miniQwen3-32BQwen3-Next-80B-A3B

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

Thought and observation tokens dominate agent context.

Selective omission can cut tokens without hurting accuracy on many turns.

Results

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Code URLs

Data URLs

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Benchmarks

Context Entities

Models

You May Also Want to Read

Survey: Reframe LLMs as agents that plan, act, and continually learn

Key finding

Reference architecture, multi-agent taxonomy, and enterprise hardening for LLM agents

Key finding

Systematizes reusable 'agentic skills' for LLM agents, their lifecycle, design patterns, risks, and evaluation

Key finding

A closed-loop Sensing→Regulating→Correcting system that routes LLM execution by uncertainty to cut errors and API cost

Key finding

Diffusion-backed agents match accuracy but run ~30% faster and can reach up to 8× speedups in some cases

Key finding