Overview
Production Readiness
0.6
Novelty Score
0.4
Cost Impact Score
0.5
Citation Count
10
Why It Matters For Business
AgentLite reduces code overhead for prototyping LLM agents so engineering teams can test agent ideas quickly without a heavy framework or large code refactor.
Summary TLDR
AgentLite is an open-source, compact Python library (<1k lines) for building task-oriented LLM agents and hierarchical multi-agent systems. It provides four modular components (PromptGen, Actions, LLM wrapper, Memory), a ManagerAgent for task decomposition and orchestration, and easy hooks to add new reasoning actions (Think, Plan, Reflect) or varied LLM backends. The authors reproduce agent-style benchmarks (HotPotQA, WebShop) to show AgentLite runs standard agent experiments and ships ready demo apps (image Q&A, painter, chess, philosopher chat). AgentLite is a tooling contribution: it speeds prototyping and experimentation, but it is not a new model or training method.
Problem Statement
Existing agent frameworks are large, rigid, or hard to refactor for new reasoning strategies and agent architectures. Researchers need a small, modular codebase to iterate new agent designs, plug in custom reasoning actions, and assemble hierarchical multi-agent systems quickly.
Main Contribution
Released AgentLite: compact, research-oriented agent library with ~959 core lines of code.
Defined a task-oriented agent API with four modules: PromptGen, Actions, LLM wrapper, Memory.
Provided ManagerAgent for hierarchical task decomposition and multi-agent orchestration.
Made it easy to add new reasoning types (e.g., Think/Plan/Reflect) and multiple LLM backends per agent.
Demonstrated reproducible experiments on HotPotQA and WebShop and several demo apps.
Key Findings
AgentLite is small and focused: core codebase is under 1,000 lines.
AgentLite runs agent-style QA experiments and shows expected model performance ordering.
AgentLite supports web-interaction benchmarks and reproduces reward gaps between models.
Results
HotPotQA medium F1-Score
HotPotQA medium F1-Score
WebShop avg. reward (all tasks)
WebShop avg. reward (all tasks)
Who Should Care
What To Try In 7 Days
Clone the AgentLite GitHub and run the included HotPotQA or WebShop example to reproduce results.
Add a simple Think action to an agent and measure behavioral changes on one benchmark.
Build a ManagerAgent that delegates a two-step task to two specialized agents (search + action).
Agent Features
Memory
- action-observation chain memory
Planning
- ReAct-like (Think action)
- Reflection (Reflect action)
- Plan action
Tool Use
- API-wrapped tools
- web search (DuckDuckGo, Wikipedia)
- WolframAlpha solver
- image generation (DALL-E)
Frameworks
- PromptGen
- Actions
- LLM wrapper
- Memory module
Is Agentic
true
Architectures
- hierarchical_multi_agent
- multi_llm_multi_agent
- manager-team
Collaboration
- manager-agent orchestration
- sequential TaskPackage delegation
Reproducibility
Code Available
Data Available
Open Source Status
- yes
Risks & Boundaries
Limitations
- Not a new LLM or training method; improvements depend on chosen LLM backend.
- Communication patterns among agents are basic; richer protocols are future work.
- Relies on external LLM APIs for execution; costs and latency depend on those providers.
- Benchmarks shown are limited to HotPotQA and WebShop-style tasks.
When Not To Use
- If you need a full-featured industrial orchestration stack with heavy integrations.
- If your team requires built-in advanced agent communication protocols not yet implemented.
- If you need an end-to-end production system with SLAs and monitoring out of the box.
Failure Modes
- Agent outputs limited by backend LLM quality (hallucinations or wrong tool calls).
- ManagerAgent may create sub-tasks that subordinate agents misinterpret.
- Memory growth could bloat prompts and cause token limits with long tasks.
Core Entities
Models
- GPT-3.5-Turbo-16k-0613
- GPT-4-0613
- GPT-4-32k-0613
- xLAM-v0.1
- Mixtral 8x7b MoE
Metrics
- F1-Score
- Accuracy
- avg. reward
Datasets
- HotPotQA
- WebShop (AgentBoard tasks)
Benchmarks
- HotPotQA
- WebShop

