A public index of 67 deployed agentic AI systems that exposes capability documentation but sparse safety disclosure.

Overview

Decision SnapshotReady For Pilot

The paper compiles public sources and developer feedback into 67 detailed cards; results reliably show high capability disclosure but weak public safety disclosure, though private practices may be underreported.

Citations3

Evidence Strength0.80

Confidence0.86

Risk Signals9

Trust Signals

Findings with numeric evidence: 4/4

Findings with evidence refs: 4/4

Results with explicit delta: 0/6

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 70%

Production readiness: 60%

Novelty: 50%

Authors

Stephen Casper, Luke Bailey, Rosco Hunter, Carson Ezell, Emma Cabalé, Michael Gerovitch, Stewart Slocum, Kevin Wei, Nikola Jurkovic, Ariba Khan, Phillip J. K. Christoffersen, A. Pinar Ozisik, Rakshit Trivedi, Dylan Hadfield-Menell, Noam Kolt

Links

Abstract / PDF / Code / Data

Why It Matters For Business

Agentic systems are moving into products; you need to verify safety practices before integrating them because public capability docs are common but safety disclosures are rare.

Who Should Care

CTO Product Manager Engineering Lead ML Engineer CEO Founder

Summary TLDR

The authors build and publish the AI Agent Index: a curated dataset of 67 deployed "agentic" AI systems (agents that plan and act). For each system they record technical components, intended uses, and safety practices from public sources and developer correspondence. Key takeaways: most developers publish documentation (47/67, 70.1%) and many release code (33/67, 49.3%), but few disclose formal safety policies (13/67, 19.4%) or report external safety audits (6/67, 9%). The index and raw data are available online; the paper is a snapshot as of Dec 31, 2024.

Problem Statement

There is no structured, public framework documenting deployed agentic AI systems' technical design, uses, and safety practices. That gap makes it hard for users, auditors, and policymakers to compare systems, assess risks, or design governance.

Main Contribution

A structured template (33 fields) for recording technical, safety, and policy-relevant features of deployed agentic systems.

A public index of 67 deployed agentic systems (snapshot as of Dec 31, 2024) summarizing components, domains, openness, and safety practices.

Key Findings

The index catalogs 67 deployed agentic AI systems.

Numbersn = 67

Practical UseUse the index as a starting sample to compare deployed agents and prioritize systems for audit or integration work.

Evidence RefAbstract, Sec 3

Most developers publish documentation and many release code.

Numbers47/67 (70.1%) docs; 33/67 (49.3%) code

Practical UseFor feature analysis or prototyping, start with publicly documented agents; expect roughly half to have usable code.

Evidence RefFigure 1, Sec 5

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
indexed_systems	67 systems	—	—	—	Total count of indexed agentic systems	Sec 3
public_documentation	70.1%	—	—	67 agents	47 of 67 agents publish documentation	Figure 1, Sec 5

What To Try In 7 Days

Browse the index (aiagentindex.mit.edu) and spot agents similar to your use case.

If evaluating an external agent, request its safety policy and audit reports before production.

Run a short red-team or jailbreak test focused on your critical workflows and data flows.

Agent Features

Memory

internal model weightsexternal storage modules for recall

Planning

chain-of-thought style planningorchestrator-driven multi-step plans

Tool Use

web browsing and postingfilesystem access and code executionAPI calls to external services

Frameworks

AutoGenMagentic One (example multi-agent system)

Is Agentic

Yes

Architectures

foundation model + scaffolding (reasoning, planning, memory, tools)multi-agent orchestration (orchestrator + subagents)

Collaboration

multi-agent cooperation via orchestrator

Reproducibility

Code AvailableYes

Data AvailableYes

Open Source StatusPartial

LicenseUnknown

Code URLs

https://aiagentindex.mit.edu/https://web.archive.org/ (agent card links referenced in paper)

Data URLs

https://aiagentindex.mit.edu/ (raw data link as stated in paper)

Risks & Boundaries

Limitations

Definition of 'agent' is loose and contested; inclusion choices can be subjective.

Snapshot limited to systems available or announced by Dec 31, 2024; field moves fast.

When Not To Use

When you need exhaustive or up-to-date coverage of every deployed agentic system.

When assessing internal-only agents or non-English systems not included in the index.

Failure Modes

Selective disclosure by developers can give false sense of safety.

Index snapshot can become outdated quickly as new agents and audits appear.

Core Entities

Models

gpt-4oOpenAI o1ChatGPT-4oLlama-3.2-90B-VisionInstruct

Metrics

percent_with_documentationpercent_code_releasepercent_with_safety_policypercent_with_external_audit

Benchmarks

GAIASWE-benchWebArenaAssistantBenchSWE-Bench Verified

Context Entities

Models

foundation models (general reference)

Benchmarks

SWE-BenchGAIAWebArenaAssistantBench

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

The index catalogs 67 deployed agentic AI systems.

Most developers publish documentation and many release code.

Results

What To Try In 7 Days

Agent Features

Reproducibility

Code URLs

Data URLs

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Benchmarks

Context Entities

Models

Benchmarks

You May Also Want to Read

Argues that 'agentic' buzzwords mostly rebrand decades-old agent and multi-agent research

Key finding

Create, customize, and run multi-step LLM agents from plain language — no code needed

Key finding

COMPASS: a multi-agent orchestration that uses RAG and an LLM-as-judge to enforce sovereignty, carbon-awareness, compliance, and ethics in实时

Key finding

RAPS: intent-driven, reputation-aware publish–subscribe for adaptive multi-agent LLM coordination

Key finding

ACP: a layered, federated protocol for secure cross-platform agent-to-agent collaboration

Key finding