Overview
The method is practical and plug-and-play: embed, rank sentences, compress the rest, and call LLMs; evidence comes from two datasets and concrete token/ROUGE measurements.
Citations2
Evidence Strength0.60
Confidence0.78
Risk Signals10
Trust Signals
Findings with numeric evidence: 3/3
Findings with evidence refs: 3/3
Results with explicit delta: 5/5
Reproducibility
Status: Partial assets available
Open source: Partial
At A Glance
Cost impact: 80%
Production readiness: 60%
Novelty: 60%
Why It Matters For Business
LeanContext lowers pay-per-use LLM input tokens so small teams can run domain QA faster and cheaper while keeping similar answer quality.
Who Should Care
Summary TLDR
LeanContext reduces the tokens sent to pay-per-use LLMs by keeping a small set of query-relevant sentences intact and compressing the rest. A lightweight Q-learning agent chooses how many sentences (top-k) to keep per query. On ArXiv and BBCNews tests LeanContext cut prompt cost 37%–68% with a small ROUGE-1 drop (~0.014–0.026 absolute). Adding top-k sentences to cheap open-source summarizers recovers or improves QA quality while still saving cost.
Problem Statement
Feeding long, domain documents into pay-per-use LLMs is expensive because API cost scales with input tokens. Standard summarizers aim at human-readable summaries and can remove details LLMs need to answer domain questions. The paper asks: how can we reduce prompt tokens for domain QA while keeping answer quality?
Main Contribution
LeanContext: a pipeline that ranks sentences by query relevance, keeps top-k sentences intact, and compresses the remaining text with an open-source summarizer.
Adaptive top-k selection via a small Q-learning agent that picks a reduction threshold per query/context state.
Key Findings
Adaptive LeanContext reduces prompt tokens and saves cost with little accuracy loss
Maximum observed cost savings are large on news-like data
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| ArXiv (N=4) prompt tokens | Adaptive LeanContext: 321 vs Original: 521 | Context (Original) | -200 tokens (~37.29% cost saved) | ArXiv (Table 1) | Table 1, row LeanContext (Adaptive k [RL]) | Table 1 |
| ArXiv ROUGE-1 | Adaptive LeanContext 0.3844 vs Original 0.3985 | Context (Original) | -0.0141 absolute (~1.41% points) | ArXiv (Table 1) | Table 1 ROUGE-1 values | Table 1 |
What To Try In 7 Days
Measure current prompt token cost on a small sample of your domain documents.
Implement a simple pipeline: retrieve N chunks, embed sentences, keep top-10% sentences intact and compress the rest with an open-source summarizer.
If cost/quality tradeoff is promising, train a small Q-learning agent on ~20–100 example queries to choose adaptive top-k thresholds.
Agent Features
Planning
Tool Use
Frameworks
Architectures
Optimization Features
Token Efficiency
Training Optimization
Inference Optimization
Reproducibility
Data URLs
Risks & Boundaries
Limitations
Evaluation limited to two datasets (ArXiv and BBCNews) generated in March 2023.
Reward computation requires LLM calls during RL training, which can be costly; authors train on a small sample.
When Not To Use
When you can afford to fine-tune a domain model or keep everything on an internal model.
When users need human-readable full-text summaries rather than machine-oriented compressed context.
Failure Modes
RL picks too small k and removes sentences containing the answer, causing 'No answer' or wrong answers.
Open-source compressor drops factual details needed for correctness even if top-k kept important sentences.

