Clear taxonomy and practical survey of persona use in LLMs: role-playing vs personalization

Overview

Decision SnapshotNeeds Validation

This is a field survey: it synthesizes existing work and points to methods and gaps, but does not provide new experimental benchmarks or single definitive solutions.

Citations3

Evidence Strength0.60

Confidence0.80

Risk Signals12

Trust Signals

Findings with numeric evidence: 1/6

Findings with evidence refs: 6/6

Results with explicit delta: 0/0

Reproducibility

Status: Partial assets available

Open source: Partial

At A Glance

Cost impact: 60%

Production readiness: 40%

Novelty: 50%

Authors

Yu-Min Tseng, Yu-Chao Huang, Teng-Yun Hsiao, Wei-Lin Chen, Chao-Wei Huang, Yu Meng, Yun-Nung Chen

Links

Abstract / PDF / Code

Why It Matters For Business

Personas let LLMs act like domain experts or tune results to users; use prompt personas to quickly prototype role-based workflows and invest in privacy-safe personalization for customer retention.

Who Should Care

CTO Product Manager ML Engineer Data Scientist

Summary TLDR

This survey organizes work that uses "persona" with large language models into two clear streams: (1) role-playing, where the model is given a persona or role to act as; and (2) personalization, where the model encodes and uses a user's persona to tailor outputs. It catalogs environments (software development, games, medical, evaluation), methods (prompt personas, multi-agent frameworks, retrieval for long histories), evaluation tools (Big Five, MBTI, MPI, LLM-as-evaluator), and open problems (long-context memory, dataset gaps, bias, safety, privacy). The paper provides a practical map, representative systems, and future directions.

Problem Statement

Research on using personas with LLMs is growing but fragmented. Practitioners lack a unifying taxonomy, a clear mapping from use cases to methods, and a concise view of evaluation practices and safety/privacy gaps.

Main Contribution

A two-part taxonomy: LLM Role-Playing (model has persona) vs LLM Personalization (model adapts to user persona).

A review of environments and methods: prompts, multi-agent agents, retrieval-memory, and fine-tuning approaches.

Key Findings

The field splits into two distinct goals: role-playing and personalization.

Numbers2 research lines

Practical UseDecide early whether you need the model to 'play a role' (behavioral control) or to 'know a user' (tailored output); choose methods accordingly.

Evidence RefAbstract; Sec 1

Role-playing is often effective with prompt-based, training-free methods.

Practical UsePrototype persona behaviors by prompt templates before investing in fine-tuning or new models.

Evidence RefSec 2 (prompt personas, training-free paradigm)

What To Try In 7 Days

Prototype a persona prompt for a concrete role (support agent, medical reviewer) and test outputs.

Build a simple 2–3 agent pipeline (planner + worker + reviewer) for a multi-step task.

Run a Big Five quick test on role-play outputs and compare to expected traits with a small human panel.

Agent Features

Memory

retrieval-based memoryshort-term context summarizationlong-term memory (via storage/summaries)

Planning

task decompositionWaterfall-like phase pipelinesself-collaboration

Tool Use

retrieval memoryweb navigationexternal knowledge/tools

Frameworks

ChatDevMetaGPTAgentVerseVoyagerMedAgentDR-CoTOPENCHAMALPHEALTHLLM

Is Agentic

Yes

Architectures

single-agentmulti-agentagent pipeline

Collaboration

cooperativeadversarialmessage pools

Reproducibility

Code AvailableYes

Data AvailableNo

Open Source StatusPartial

LicenseUnknown

Code URLs

https://github.com/MiuLab/PersonaLLM-Survey

Risks & Boundaries

Limitations

Heterogeneous metrics across subfields make direct comparisons hard.

Many tasks lack standardized datasets or format-specific benchmarks.

When Not To Use

When legal privacy constraints forbid storing user persona data in prompts or memory.

When strict safety or non-toxicity guarantees are required without additional safeguards.

Failure Modes

Bias amplification when assigning demographic personas.

Increased toxicity or harmful outputs under certain persona prompts.

Core Entities

Models

ChatGPTVoyagerMetaGPTChatDevAgentVerseDR-CoTMedAgentOPENCHAMALPHEALTHLLM

Metrics

Accuracytask success rateinform & success rateBig Five (BFI)MBTIMachine Personality Inventory (MPI)

Datasets

WebShopMind2WebWebArenaVisualWebArenaVisualWebBenchAmazon ReviewMovieLensYelpTripAdvisorMINDMultiWOZPersonaChat

Benchmarks

WebShopMind2WebWebArenaVisualWebArenaVisualWebBench

Overview

Trust Signals

Reproducibility

At A Glance

Authors

Links

Why It Matters For Business

Who Should Care

Summary TLDR

Problem Statement

Main Contribution

Key Findings

The field splits into two distinct goals: role-playing and personalization.

Role-playing is often effective with prompt-based, training-free methods.

What To Try In 7 Days

Agent Features

Reproducibility

Code URLs

Risks & Boundaries

Limitations

When Not To Use

Failure Modes

Core Entities

Models

Metrics

Datasets

Benchmarks

You May Also Want to Read

Chemistry foundation models power structure-focused multimodal RAG inside hierarchical multi-agent workflows

Key finding

Argues that 'agentic' buzzwords mostly rebrand decades-old agent and multi-agent research

Key finding

TRiSM: practical trust, risk and security controls for LLM-based multi-agent systems

Key finding

A dynamic town simulation that tests LLM agents on doing tasks while following local cultural norms

Key finding

A process-aware, auditable multi-agent evaluator that produces more stable, human-aligned scores than a single LLM judge

Key finding