Open-source projects store agent instructions in special README-like files, but those files focus on how to run code and rarely specify non‑

Overview

Decision SnapshotReady For Pilot

The study uses a large, real-world corpus and mixed methods. Findings about prevalence and maintenance are well supported. Generalization is limited to public repos and three agent tools.

Citations0

Evidence Strength0.80

Confidence0.85

Risk Signals9

Trust Signals

Findings with numeric evidence: 8/8

Findings with evidence refs: 8/8

Results with explicit delta: 3/8

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 60%

Production readiness: 60%

Novelty: 45%

Authors

Worawalan Chatlatanagulchai, Hao Li, Yutaro Kashiwa, Brittany Reid, Kundjanasith Thonglek, Pattara Leelaprute, Arnon Rungsawang, Bundit Manaskasemsak, Bram Adams, Ahmed E. Hassan, Hajimu Iida

Links

Abstract / PDF / Code / Data

Why It Matters For Business

Agent context files control what AI developers do in your codebase. If they lack security or performance rules, agents will likely produce code that works but is vulnerable or inefficient. Treat these files like configuration and governance documents so agents follow team standards.

Who Should Care

CTO Product Manager Engineering Lead ML Engineer Founder

Summary TLDR

The authors analyze 2,303 agent context files (e.g., CLAUDE.md, AGENTS.md) from 1,925 repos to show these files are long, hard to read, actively maintained, and biased toward functional instructions (build, testing, implementation). Non-functional concerns like security and performance are rare. Automated labeling of these files is feasible (micro F1 0.79) for concrete topics but struggles with abstract guidance.

Problem Statement

AI coding agents rely on persistent, project-level instruction files to act correctly. We lack evidence about what those files contain, how they evolve, and whether we can automatically monitor them. Without that evidence, agents can be well‑informed about how to run code but poorly constrained on safety, performance, or quality.

Main Contribution

A large empirical corpus: 2,303 agent context files from 1,925 open-source repositories across Claude Code, OpenAI Codex, and GitHub Copilot.

A 16‑label taxonomy of agent instructions (e.g., Build & Run, Testing, Architecture, Security) and prevalence counts.

Key Findings

Collected 2,303 agent context files across 1,925 repositories.

Numbers2,303 files; 1,925 repos

Practical UseThere is enough real-world adoption to study and build tools that manage these files; tool builders should assume manifests are common.

Evidence RefDataset & Table 1, Section 3

Files are long and differ by tool: Copilot and Claude files are longer than Codex.

NumbersMedian words: Copilot 535, Claude 485, Codex 335.5

Practical UseExpect higher token costs when loading Copilot/Claude manifests; optimize retrieval or compress context for those tools.

Evidence RefFigure 3a; Section 4.1.3

Results

Metric	Value	Baseline	Delta	Split / Dataset	Evidence	Evidence Ref
Corpus size	2,303 agent context files from 1,925 repos	—	—	—	Section 3; Table 1	Table 1
Median words per file	Copilot 535, Claude 485, Codex 335.5	—	Copilot & Claude > Codex	By agent type	Figure 3a; Section 4.1.3	Figure 3a

What To Try In 7 Days

Scan your repo for agent context files (CLAUDE.md, AGENTS.md, copilot-instructions.md).

Add a short 'Non-functional requirements' section that lists mandatory security and performance rules.

Include context-file checks in PR templates: 'Did you update the agent manifest if build or API changed?' and require a CODEOWNER approval for manifest edits.

Agent Features

Memory

persistent project-level context files (long-term memory)

Planning

task decompositionmulti-step planning

Tool Use

IDE/tool invocation (run tests, execute scripts)CI/CD and build commands

Frameworks

Claude CodeOpenAI CodexGitHub Copilot

Is Agentic

Yes

Architectures

LLM-based agents with tool useagents that combine memory, planning, and tool APIs

Collaboration

human-in-the-loop review and code owners

Optimization Features

Token Efficiency

recommend compressing or selecting sections to reduce token cost

Reproducibility

Code AvailableYes

Data AvailableYes

Open Source StatusPartial

LicenseUnknown

Code URLs

https://github.com/woraamy/Agent-Context-File-Analysis

Data URLs

https://github.com/woraamy/Agent-Context-File-Analysis

Risks & Boundaries

Limitations

Manual labels record only presence, not the depth of a topic (binary labeling may overstate importance).

Dataset limited to public repos and three agent tools; private corpora may differ.

When Not To Use

Do not generalize prevalence numbers to private or enterprise-only repositories without further sampling.

Avoid using the taxonomy as a strict checklist for highly domain-specific projects without tailoring.

Failure Modes

Agents produce insecure or inefficient code if manifests omit NFRs like security and performance.

Manifests may become append-only and contradictory if not versioned or reviewed.

Core Entities

Models

GPT-5Claude Opus 4.1Gemini 2.5 Pro

Metrics

Flesch Reading Ease (FRE)Micro-average F1Precision/Recall/F1 per labelMedian word countsMann-Whitney U, Cliff's delta

Datasets

AIDev dataset (repos list used for selection)Replication package dataset (Agent-Context-File-Analysis)

Benchmarks

SWE-bench (cited as example benchmark work)

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

Collected 2,303 agent context files across 1,925 repositories.

Files are long and differ by tool: Copilot and Claude files are longer than Codex.

Results

What To Try In 7 Days

Agent Features

Optimization Features

Reproducibility

Code URLs

Data URLs

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Benchmarks

You May Also Want to Read

Survey of how LLMs become autonomous agents, the core architecture, and the research gaps to make them safe and practical.

Key finding

Agentic ROI: prioritize real user value, not raw model scores

Key finding

Hierarchical multi-agent research agent that compresses long context, routes subtasks to specialized tools, and self-corrects failures.

Key finding

Declarative agent spec plus a runtime that enforces safety, memory, and low-latency execution

Key finding

Jointly erase private facts from an LLM agent's weights and persistent memory to stop recontamination

Key finding