Overview
The framework is implementable with prompts and small compute. Evidence is from a controlled one-shot simulation (120 runs) and needs broader, long-horizon tests before production deployment.
Citations0
Evidence Strength0.60
Confidence0.80
Risk Signals10
Trust Signals
Findings with numeric evidence: 3/3
Findings with evidence refs: 3/3
Results with explicit delta: 0/3
Reproducibility
Status: No open assets linked
Open source: Unknown
At A Glance
Cost impact: 30%
Production readiness: 40%
Novelty: 60%
Why It Matters For Business
How agents phrase decisions affects cooperation and task success; monitoring and nudging tone and explanations reduces coordination failures and builds trust in agentic workflows.
Who Should Care
Summary TLDR
The paper introduces a practical framework to measure "Interactional fairness" in multi-agent systems driven by large language models. Interactional fairness splits into Interpersonal fairness (respectful tone) and Informational fairness (explanation quality). The authors adapt human-survey tools (Colquitt's scales, Critical Incident Technique, journaling) into prompt-based tests and a JSON evaluation card. In a controlled negotiation study (24 conditions × 5 runs), respectful tone and clear justification raised acceptance rates and fairness ratings; context changed which signal mattered most (tone in collaborative settings, explanations in competitive ones). The framework is a low-cost, aud
Problem Statement
Existing fairness work for multi-agent systems focuses on outcomes and procedures. As agents talk more, how they speak and explain decisions becomes a separate, measurable fairness axis that can change cooperation and outcomes. We need a practical way to audit and debug communicative fairness in LLM-driven multi-agent systems.
Main Contribution
A conceptual adaptation of Interactional fairness (Interpersonal + Informational) for non-sentient LLM agents, treating fairness as observable communicative behavior.
A mixed-method evaluation pipeline: prompt-based Likert ratings, Critical Incident Technique sketches, Explanation Journaling, and a JSON Interactional Fairness Evaluation Card.
Key Findings
Respectful tone and clear justification increase proposal acceptance even when resource splits are identical.
Distributional fairness (the proposed split) remains the strongest predictor, but communicative cues can partially offset inequality.
Results
| Metric | Value | Baseline | Delta | Split / Dataset | Evidence | Evidence Ref |
|---|---|---|---|---|---|---|
| Acceptance rate for equal (5:5) proposals under High-High | 1.0 (100%) | — | — | High-High, collaborative (Table 3) | Table 3 shows accept mean = 1 for High-High 5:5 | Table 3 |
| Decision Tree feature importance (collaborative) | split = 0.70, interpersonal = 0.30, informational = 0.0 | — | — | Predictive modeling (collaborative) | Table 5 decision tree importances | Table 5 |
What To Try In 7 Days
Run a small negotiation test where agents use the Interactional Fairness Evaluation Card to log tone, explanation scores, and accept/reject decisions.
Add a prompt template that enforces a respectful opening line and a 1–2 sentence justification for proposals and measure acceptance change.
Track acceptance rate by context (collaborative vs competitive) to decide whether to emphasize tone or explanation in policies.
Agent Features
Memory
Tool Use
Frameworks
Is Agentic
Yes
Architectures
Collaboration
Reproducibility
Risks & Boundaries
Limitations
Simple one-shot negotiation setup limits ecological validity for real multi-step systems.
Agents self-evaluate with prompts; this can introduce judge bias and circularity.
When Not To Use
As the only fairness check for complex, long-running multi-agent deployments.
To infer agent sentience or moral understanding; the framework measures observable behavior only.
Failure Modes
Agents may be tuned to game the evaluation prompts without genuine improvement in cooperative behavior.
Context mismatch: a one-size communication policy harms performance when task framing changes.

