Overview
Design is clearly useful for long-running document workflows and is supported by benchmark and ablation data. Evidence is from a focused set of research-oriented tasks; broader generalization is untested.
Citations0
Evidence Strength0.75
Confidence0.80
Risk Signals10
Trust Signals
Findings with numeric evidence: 4/4
Findings with evidence refs: 4/4
Results with explicit delta: 2/3
Reproducibility
Status: Partial assets available
Open source: Partial
At A Glance
Cost impact: 60%
Production readiness: 60%
Novelty: 70%
Why It Matters For Business
If your workflows involve long document processing or multi-step knowledge work, state management matters more than raw model size. A file-centric agent design can make smaller, cheaper models far more reliable over long runs and reduce costly re-runs.
Who Should Care
Summary TLDR
InfiAgent keeps an agent's working memory small by storing all persistent task state as files and reconstructing a fixed-size reasoning context from that workspace plus a short recent-action buffer. This file-centric design, a hierarchical agent stack, and an "external attention" tool pipeline improve reliability on long tasks (80-paper literature review) and let a 20B open model match larger systems on the DeepResearch benchmark without fine-tuning.
Problem Statement
Current LLM agents pack long-term state into the prompt. As tasks grow, prompts bloat and agents break: early errors accumulate, relevance is lost, and performance drops on long workflows.
Main Contribution
File-centric state abstraction: treat workspace files as the authoritative persistent state instead of embedding history in the prompt.
Bounded reasoning reconstruction: build each reasoning prompt from a workspace snapshot plus a fixed small window of recent actions (e.g., k=10) so context size stays constant.
Key Findings
InfiAgent (gpt-oss-20b) scores 41.45 on DeepResearch using no task-specific fine-tuning.
On an 80-paper literature review, InfiAgent (gpt-oss-20b) averaged coverage 67.1 papers per run (max 80).
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| DeepResearch overall | 41.45 | larger proprietary agents (various) | — | DeepResearch benchmark | InfiAgent with gpt-oss-20b achieved 41.45 overall (Table 2) | Table 2 |
| Literature review coverage (avg) | 67.1 papers | No File State (gpt-oss-20b) avg 3.2 | +63.9 | 80-paper literature review | InfiAgent avg coverage 67.1 vs ablation 3.2 (Table 1) | Table 1 |
What To Try In 7 Days
Prototype a workspace-as-state: store intermediate outputs and plans in files rather than appending into prompts.
Implement a fixed-size recent-action buffer (e.g., last 10 actions) to rebuild context for each step.
Wrap heavy-document reads in an isolated extractor tool (answer_from_pdf) and return only extracted facts.
Agent Features
Memory
Planning
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Collaboration
Optimization Features
Token Efficiency
Infra Optimization
System Optimization
Inference Optimization
Reproducibility
Risks & Boundaries
Limitations
Does not fix model reasoning errors: wrong outputs can be written into persistent state and propagated (Section 6).
Introduces latency: serial hierarchical execution and file operations raise response time, making it less suitable for real-time use (Section 6).
When Not To Use
Real-time interactive applications where low latency matters.
Workloads that require heavy parallelism across independent subtasks without strict serialization.
Failure Modes
Hallucination propagation: an early incorrect artifact saved to files can mislead later steps.
Early termination or skipping items in long runs if higher-level orchestration fails (observed in baselines).

