OpenHands reduces the engineering work to run and compare LLM-driven developer agents by providing a sandboxed runtime, shared skills, and benchmark harness under an MIT license, so teams can prototype agent integrations faster and safely.
Key finding
A single generalist agent (CodeAct) performs competitively across software, web, and miscellaneous tasks without benchmark-specific prompt tuning.
Numbers: HumanEvalFix: 79.3% (CodeAct v1.5, gpt-4o); SWE-Bench Lite: 22–26% (CodeAct v1.8)

