Agentic chatbots need an 'interactional' ethics that centres on respect

Overview

Decision SnapshotNeeds Validation

Conceptually strong and grounded in psychology, but mostly theoretical with limited empirical validation for specific engineering choices.

Citations1

Evidence Strength0.55

Confidence0.80

Risk Signals8

Trust Signals

Findings with numeric evidence: 0/3

Findings with evidence refs: 3/3

Results with explicit delta: 0/0

Reproducibility

Status: No open assets linked

Open source: No

At A Glance

Cost impact: 35%

Production readiness: 30%

Novelty: 60%

Authors

Lize Alberts, Geoff Keeling, Amanda McCroskery

Links

Abstract / PDF

Why It Matters For Business

Agentic conversational features can damage user trust, engagement, and wellbeing if systems ignore context and treat people as data points; fixing this protects brand trust and long-term product adoption.

Who Should Care

Product Manager CTO CEO ML Engineer Engineering Lead Data Scientist

Summary TLDR

This paper argues that current LLM ethics (helpful, honest, harmless) focuses on words and fails to capture situational, relational harms that arise when conversational systems act like social agents. It proposes 'interactional ethics' centred on respect, operationalised as duties to support users' autonomy, competence, and self-worth. The paper lists three classes of interactional harms (direct, influence, collective) and gives design suggestions: embed respectful assumptions, operationalise respect checks in self-evaluation, and keep/limit memory of sensitive user details.

Problem Statement

As conversational systems become proactive and agent-like, existing alignment criteria (helpful, honest, harmless) miss pragmatic, relational harms that arise in real interactions. We need an ethics that evaluates how systems treat people in context, not only the semantic content of outputs.

Main Contribution

Argues that agentic conversational AI should be evaluated as social actors, not only as output engines.

Defines three interactional harm types: direct (overt/covert), behaviour-influencing (misleading/manipulating), and collective (cumulative relational harms).

Key Findings

Semantic-focused HHH alignment (helpful, honest, harmless) can miss situational disrespect.

Practical UseAdd interactional checks (context, role, timing) to alignment pipelines rather than only filtering output text.

Evidence RefAbstract; Intro: critique of HHH criteria

Interactional harms cluster into three types: direct, behaviour-influencing, and collective.

Practical UseDesign evaluations and mitigations tailored to each harm type (e.g., tone controls for direct harms; citation/verification for misleading; memory controls for collective harms).

Evidence RefTable 1 and 'Social-interactional harms' section

What To Try In 7 Days

Audit conversational flows for potential interactional disrespect (tone, timing, assumptions).

Add lightweight memory rules: only store explicit user permissions and clear 'do-not-remember' flags.

Prototype a consent/negotiation UI that lets users set interaction style and memory preferences.

Agent Features

Memory

short-term memory (conversation context)long-term memory (selective user facts)

Planning

proactive initiation (discussed as perceived agency)

Frameworks

Constitutional AISelf-correction strategiesPerson-centred care

Is Agentic

Yes

Reproducibility

Code AvailableNo

Data AvailableNo

Open Source StatusNo

LicenseUnknown

Risks & Boundaries

Limitations

Primarily conceptual: lacks original empirical tests or user studies.

Cultural variation and differing social norms are acknowledged but not operationalised.

When Not To Use

When you need narrow, task-focused performance metrics unrelated to ongoing social interaction.

In systems without any user-facing conversational role or without persistent user relationships.

Failure Modes

Over-personalisation that invades privacy or feels manipulative.

Selective memory leading to perceived insincerity or betrayal.

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

Semantic-focused HHH alignment (helpful, honest, harmless) can miss situational disrespect.

Interactional harms cluster into three types: direct, behaviour-influencing, and collective.

What To Try In 7 Days

Agent Features

Reproducibility

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

You May Also Want to Read

AgentAuditor: memory‑augmented RAG + CoT that makes LLM evaluators reach human-level accuracy on agent safety

Key finding

Metamorphic tests show many LLM agents give different answers to the same problem when phrased differently

Key finding

R-Judge: a human-curated benchmark (569 agent logs) that tests whether LLMs spot safety risks in agent interactions

Key finding

A single LLM can role-play homogeneous multi-agent workflows and cut inference cost via KV-cache reuse

Key finding

DeceptGuard: detect agent deception by reading CoT text and activation probes

Key finding