How LLM-based coding agents must earn developer trust to be useful

February 19, 20256 min

Overview

Decision SnapshotNeeds Validation

The paper is a conceptual, opinion-oriented roadmap with examples and proposals but no controlled experiments or quantitative evaluations.

Citations1

Evidence Strength0.35

Confidence0.78

Risk Signals10

Trust Signals

Findings with numeric evidence: 0/6

Findings with evidence refs: 6/6

Results with explicit delta: 0/0

Reproducibility

Status: No open assets linked

Open source: Unknown

At A Glance

Cost impact: 60%

Production readiness: 50%

Novelty: 60%

Authors

Abhik Roychoudhury, Corina Pasareanu, Michael Pradel, Baishakhi Ray

Links

Abstract / PDF

Why It Matters For Business

AI coding agents can cut developer time but only if they earn developer trust through verifiable outputs, provenance, and integrated review processes.

Who Should Care

Summary TLDR

This short opinion paper argues that deploying AI "software engineers"—LLM-based agents that write, edit, and test code—depends less on raw capability and more on trust. The authors outline technical (testing, static analysis, formal proofs, guardrails) and human (explainability, provenance, review parity) mechanisms for trust, survey early agent systems, and call for unified, explainable agents that integrate coding, testing, and review into developer workflows.

Problem Statement

LLMs can generate and edit code but industry adoption of fully autonomous AI software engineers is held back by lack of developer trust. The paper asks how LLM agents can be designed to earn the same practical, reviewable trust that human contributors have.

Main Contribution

Framing trust as the central barrier to adopting AI software engineers and separating technical vs human trust.

Describing what software-engineering LLM agents are: LLM back-ends + tool interaction + autonomy.

Key Findings

Developer trust, not raw generation skill, is the main barrier to widespread adoption of AI software engineers.

Practical UsePrioritize systems that make AI outputs verifiable and auditable before trying to fully automate coding tasks.

Evidence RefSection 1; cites Forbes blog [3]

Integrating standard engineering tools (tests, linters, program analysis) into LLM agents increases technical trust.

Practical UseBuild agents to produce tests and run linters automatically and fail fast when checks fail.

Evidence RefSection 4 'Testing and Lightweight Static Analysis'; Table 1

What To Try In 7 Days

Run an LLM agent to generate a small feature and require it produce tests; run those tests and linters automatically.

Add provenance tags and a short rationale to AI-generated pull requests before human review.

Implement basic guardrails: input sanitizers and output validators around any code-writing agent.

Agent Features

Memory
code search / retrieval for intent inference
Planning
autonomous nondeterministic work-planstool invocation planning
Tool Use
file navigationcode editingtest executionstatic analysisshell commandsweb browsing
Is Agentic

Yes

Architectures
LLM back-end
Collaboration
AI-human feedback loopsreview parityconfidence and provenance reporting

Reproducibility

Code AvailableNo
Data AvailableNo
Open Source StatusUnknown
LicenseUnknown

Risks & Boundaries

Limitations

Opinion piece with no empirical user studies or measurements.

No quantitative evaluation of trust interventions or agent designs.

When Not To Use

As sole decision-maker for safety-critical code without formal verification.

As a replacement for human review in regulated environments.

Failure Modes

Agent hallucinations leading to incorrect code.

Prompt-injection or malicious inputs causing unsafe outputs.

Core Entities

Models

LLMs (general)Codex

Metrics

correctnesssecurityperformancemaintainabilityexplainabilityconfidence