A public index of 67 deployed agentic AI systems that exposes capability documentation but sparse safety disclosure.

February 3, 20257 min

Overview

Decision SnapshotReady For Pilot

The paper compiles public sources and developer feedback into 67 detailed cards; results reliably show high capability disclosure but weak public safety disclosure, though private practices may be underreported.

Citations3

Evidence Strength0.80

Confidence0.86

Risk Signals9

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 0/6

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 70%

Production readiness: 60%

Novelty: 50%

Authors

Stephen Casper, Luke Bailey, Rosco Hunter, Carson Ezell, Emma Cabalé, Michael Gerovitch, Stewart Slocum, Kevin Wei, Nikola Jurkovic, Ariba Khan, Phillip J. K. Christoffersen, A. Pinar Ozisik, Rakshit Trivedi, Dylan Hadfield-Menell, Noam Kolt

Links

Abstract / PDF / Code / Data

Why It Matters For Business

Agentic systems are moving into products; you need to verify safety practices before integrating them because public capability docs are common but safety disclosures are rare.

Who Should Care

Summary TLDR

The authors build and publish the AI Agent Index: a curated dataset of 67 deployed "agentic" AI systems (agents that plan and act). For each system they record technical components, intended uses, and safety practices from public sources and developer correspondence. Key takeaways: most developers publish documentation (47/67, 70.1%) and many release code (33/67, 49.3%), but few disclose formal safety policies (13/67, 19.4%) or report external safety audits (6/67, 9%). The index and raw data are available online; the paper is a snapshot as of Dec 31, 2024.

Problem Statement

There is no structured, public framework documenting deployed agentic AI systems' technical design, uses, and safety practices. That gap makes it hard for users, auditors, and policymakers to compare systems, assess risks, or design governance.

Main Contribution

A structured template (33 fields) for recording technical, safety, and policy-relevant features of deployed agentic systems.

A public index of 67 deployed agentic systems (snapshot as of Dec 31, 2024) summarizing components, domains, openness, and safety practices.

Key Findings

The index catalogs 67 deployed agentic AI systems.

Numbersn = 67

Practical UseUse the index as a starting sample to compare deployed agents and prioritize systems for audit or integration work.

Evidence RefAbstract, Sec 3

Most developers publish documentation and many release code.

Numbers47/67 (70.1%) docs; 33/67 (49.3%) code

Practical UseFor feature analysis or prototyping, start with publicly documented agents; expect roughly half to have usable code.

Evidence RefFigure 1, Sec 5

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
indexed_systems67 systemsTotal count of indexed agentic systemsSec 3
public_documentation70.1%67 agents47 of 67 agents publish documentationFigure 1, Sec 5

What To Try In 7 Days

Browse the index (aiagentindex.mit.edu) and spot agents similar to your use case.

If evaluating an external agent, request its safety policy and audit reports before production.

Run a short red-team or jailbreak test focused on your critical workflows and data flows.

Agent Features

Memory
internal model weightsexternal storage modules for recall
Planning
chain-of-thought style planningorchestrator-driven multi-step plans
Tool Use
web browsing and postingfilesystem access and code executionAPI calls to external services
Frameworks
AutoGenMagentic One (example multi-agent system)
Is Agentic

Yes

Architectures
foundation model + scaffolding (reasoning, planning, memory, tools)multi-agent orchestration (orchestrator + subagents)
Collaboration
multi-agent cooperation via orchestrator

Reproducibility

Risks & Boundaries

Limitations

Definition of 'agent' is loose and contested; inclusion choices can be subjective.

Snapshot limited to systems available or announced by Dec 31, 2024; field moves fast.

When Not To Use

When you need exhaustive or up-to-date coverage of every deployed agentic system.

When assessing internal-only agents or non-English systems not included in the index.

Failure Modes

Selective disclosure by developers can give false sense of safety.

Index snapshot can become outdated quickly as new agents and audits appear.

Core Entities

Models

gpt-4oOpenAI o1ChatGPT-4oLlama-3.2-90B-VisionInstruct

Metrics

percent_with_documentationpercent_code_releasepercent_with_safety_policypercent_with_external_audit

Benchmarks

GAIASWE-benchWebArenaAssistantBenchSWE-Bench Verified

Context Entities

Models

foundation models (general reference)

Benchmarks

SWE-BenchGAIAWebArenaAssistantBench