Clear taxonomy and practical survey of persona use in LLMs: role-playing vs personalization

June 3, 20247 min

Overview

Decision SnapshotNeeds Validation

This is a field survey: it synthesizes existing work and points to methods and gaps, but does not provide new experimental benchmarks or single definitive solutions.

Citations3

Evidence Strength0.60

Confidence0.80

Risk Signals12

Trust Signals

Findings with numeric evidence: 1/6

Findings with evidence refs: 6/6

Results with explicit delta: 0/0

Reproducibility

Status: Partial assets available

Open source: Partial

At A Glance

Cost impact: 60%

Production readiness: 40%

Novelty: 50%

Authors

Yu-Min Tseng, Yu-Chao Huang, Teng-Yun Hsiao, Wei-Lin Chen, Chao-Wei Huang, Yu Meng, Yun-Nung Chen

Links

Abstract / PDF / Code

Why It Matters For Business

Personas let LLMs act like domain experts or tune results to users; use prompt personas to quickly prototype role-based workflows and invest in privacy-safe personalization for customer retention.

Who Should Care

Summary TLDR

This survey organizes work that uses "persona" with large language models into two clear streams: (1) role-playing, where the model is given a persona or role to act as; and (2) personalization, where the model encodes and uses a user's persona to tailor outputs. It catalogs environments (software development, games, medical, evaluation), methods (prompt personas, multi-agent frameworks, retrieval for long histories), evaluation tools (Big Five, MBTI, MPI, LLM-as-evaluator), and open problems (long-context memory, dataset gaps, bias, safety, privacy). The paper provides a practical map, representative systems, and future directions.

Problem Statement

Research on using personas with LLMs is growing but fragmented. Practitioners lack a unifying taxonomy, a clear mapping from use cases to methods, and a concise view of evaluation practices and safety/privacy gaps.

Main Contribution

A two-part taxonomy: LLM Role-Playing (model has persona) vs LLM Personalization (model adapts to user persona).

A review of environments and methods: prompts, multi-agent agents, retrieval-memory, and fine-tuning approaches.

Key Findings

The field splits into two distinct goals: role-playing and personalization.

Numbers2 research lines

Practical UseDecide early whether you need the model to 'play a role' (behavioral control) or to 'know a user' (tailored output); choose methods accordingly.

Evidence RefAbstract; Sec 1

Role-playing is often effective with prompt-based, training-free methods.

Practical UsePrototype persona behaviors by prompt templates before investing in fine-tuning or new models.

Evidence RefSec 2 (prompt personas, training-free paradigm)

What To Try In 7 Days

Prototype a persona prompt for a concrete role (support agent, medical reviewer) and test outputs.

Build a simple 2–3 agent pipeline (planner + worker + reviewer) for a multi-step task.

Run a Big Five quick test on role-play outputs and compare to expected traits with a small human panel.

Agent Features

Memory
retrieval-based memoryshort-term context summarizationlong-term memory (via storage/summaries)
Planning
task decompositionWaterfall-like phase pipelinesself-collaboration
Tool Use
retrieval memoryweb navigationexternal knowledge/tools
Frameworks
ChatDevMetaGPTAgentVerseVoyagerMedAgentDR-CoTOPENCHAMALPHEALTHLLM
Is Agentic

Yes

Architectures
single-agentmulti-agentagent pipeline
Collaboration
cooperativeadversarialmessage pools

Reproducibility

Code AvailableYes
Data AvailableNo
Open Source StatusPartial
LicenseUnknown

Risks & Boundaries

Limitations

Heterogeneous metrics across subfields make direct comparisons hard.

Many tasks lack standardized datasets or format-specific benchmarks.

When Not To Use

When legal privacy constraints forbid storing user persona data in prompts or memory.

When strict safety or non-toxicity guarantees are required without additional safeguards.

Failure Modes

Bias amplification when assigning demographic personas.

Increased toxicity or harmful outputs under certain persona prompts.

Core Entities

Models

ChatGPTVoyagerMetaGPTChatDevAgentVerseDR-CoTMedAgentOPENCHAMALPHEALTHLLM

Metrics

Accuracytask success rateinform & success rateBig Five (BFI)MBTIMachine Personality Inventory (MPI)

Datasets

WebShopMind2WebWebArenaVisualWebArenaVisualWebBenchAmazon ReviewMovieLensYelpTripAdvisorMINDMultiWOZPersonaChat

Benchmarks

WebShopMind2WebWebArenaVisualWebArenaVisualWebBench