Open-source projects store agent instructions in special README-like files, but those files focus on how to run code and rarely specify non‑

November 17, 20258 min

Overview

Decision SnapshotReady For Pilot

The study uses a large, real-world corpus and mixed methods. Findings about prevalence and maintenance are well supported. Generalization is limited to public repos and three agent tools.

Citations0

Evidence Strength0.80

Confidence0.85

Risk Signals9

Trust Signals

Findings with numeric evidence: 8/8

Findings with evidence refs: 8/8

Results with explicit delta: 3/8

Reproducibility

Status: Code + data available

Open source: Partial

At A Glance

Cost impact: 60%

Production readiness: 60%

Novelty: 45%

Authors

Worawalan Chatlatanagulchai, Hao Li, Yutaro Kashiwa, Brittany Reid, Kundjanasith Thonglek, Pattara Leelaprute, Arnon Rungsawang, Bundit Manaskasemsak, Bram Adams, Ahmed E. Hassan, Hajimu Iida

Links

Abstract / PDF / Code / Data

Why It Matters For Business

Agent context files control what AI developers do in your codebase. If they lack security or performance rules, agents will likely produce code that works but is vulnerable or inefficient. Treat these files like configuration and governance documents so agents follow team standards.

Who Should Care

Summary TLDR

The authors analyze 2,303 agent context files (e.g., CLAUDE.md, AGENTS.md) from 1,925 repos to show these files are long, hard to read, actively maintained, and biased toward functional instructions (build, testing, implementation). Non-functional concerns like security and performance are rare. Automated labeling of these files is feasible (micro F1 0.79) for concrete topics but struggles with abstract guidance.

Problem Statement

AI coding agents rely on persistent, project-level instruction files to act correctly. We lack evidence about what those files contain, how they evolve, and whether we can automatically monitor them. Without that evidence, agents can be well‑informed about how to run code but poorly constrained on safety, performance, or quality.

Main Contribution

A large empirical corpus: 2,303 agent context files from 1,925 open-source repositories across Claude Code, OpenAI Codex, and GitHub Copilot.

A 16‑label taxonomy of agent instructions (e.g., Build & Run, Testing, Architecture, Security) and prevalence counts.

Key Findings

Collected 2,303 agent context files across 1,925 repositories.

Numbers2,303 files; 1,925 repos

Practical UseThere is enough real-world adoption to study and build tools that manage these files; tool builders should assume manifests are common.

Evidence RefDataset & Table 1, Section 3

Files are long and differ by tool: Copilot and Claude files are longer than Codex.

NumbersMedian words: Copilot 535, Claude 485, Codex 335.5

Practical UseExpect higher token costs when loading Copilot/Claude manifests; optimize retrieval or compress context for those tools.

Evidence RefFigure 3a; Section 4.1.3

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Corpus size2,303 agent context files from 1,925 reposSection 3; Table 1Table 1
Median words per fileCopilot 535, Claude 485, Codex 335.5Copilot & Claude > CodexBy agent typeFigure 3a; Section 4.1.3Figure 3a

What To Try In 7 Days

Scan your repo for agent context files (CLAUDE.md, AGENTS.md, copilot-instructions.md).

Add a short 'Non-functional requirements' section that lists mandatory security and performance rules.

Include context-file checks in PR templates: 'Did you update the agent manifest if build or API changed?' and require a CODEOWNER approval for manifest edits.

Agent Features

Memory
persistent project-level context files (long-term memory)
Planning
task decompositionmulti-step planning
Tool Use
IDE/tool invocation (run tests, execute scripts)CI/CD and build commands
Frameworks
Claude CodeOpenAI CodexGitHub Copilot
Is Agentic

Yes

Architectures
LLM-based agents with tool useagents that combine memory, planning, and tool APIs
Collaboration
human-in-the-loop review and code owners

Optimization Features

Token Efficiency
recommend compressing or selecting sections to reduce token cost

Reproducibility

Code AvailableYes
Data AvailableYes
Open Source StatusPartial
LicenseUnknown

Risks & Boundaries

Limitations

Manual labels record only presence, not the depth of a topic (binary labeling may overstate importance).

Dataset limited to public repos and three agent tools; private corpora may differ.

When Not To Use

Do not generalize prevalence numbers to private or enterprise-only repositories without further sampling.

Avoid using the taxonomy as a strict checklist for highly domain-specific projects without tailoring.

Failure Modes

Agents produce insecure or inefficient code if manifests omit NFRs like security and performance.

Manifests may become append-only and contradictory if not versioned or reviewed.

Core Entities

Models

GPT-5Claude Opus 4.1Gemini 2.5 Pro

Metrics

Flesch Reading Ease (FRE)Micro-average F1Precision/Recall/F1 per labelMedian word countsMann-Whitney U, Cliff's delta

Datasets

AIDev dataset (repos list used for selection)Replication package dataset (Agent-Context-File-Analysis)

Benchmarks

SWE-bench (cited as example benchmark work)