Overview
Production Readiness
0.4
Novelty Score
0.5
Cost Impact Score
0.6
Citation Count
3
Why It Matters For Business
Personas let LLMs act like domain experts or tune results to users; use prompt personas to quickly prototype role-based workflows and invest in privacy-safe personalization for customer retention.
Summary TLDR
This survey organizes work that uses "persona" with large language models into two clear streams: (1) role-playing, where the model is given a persona or role to act as; and (2) personalization, where the model encodes and uses a user's persona to tailor outputs. It catalogs environments (software development, games, medical, evaluation), methods (prompt personas, multi-agent frameworks, retrieval for long histories), evaluation tools (Big Five, MBTI, MPI, LLM-as-evaluator), and open problems (long-context memory, dataset gaps, bias, safety, privacy). The paper provides a practical map, representative systems, and future directions.
Problem Statement
Research on using personas with LLMs is growing but fragmented. Practitioners lack a unifying taxonomy, a clear mapping from use cases to methods, and a concise view of evaluation practices and safety/privacy gaps.
Main Contribution
A two-part taxonomy: LLM Role-Playing (model has persona) vs LLM Personalization (model adapts to user persona).
A review of environments and methods: prompts, multi-agent agents, retrieval-memory, and fine-tuning approaches.
A summary of evaluation approaches for personality fidelity, including Big Five, MBTI, MPI, and LLM-as-evaluator.
A compact list of challenges and future directions: general frameworks, long-context personas, datasets, bias, safety, and privacy.
A maintained paper collection and code pointer for ongoing updates (GitHub).
Key Findings
The field splits into two distinct goals: role-playing and personalization.
Role-playing is often effective with prompt-based, training-free methods.
Multi-agent role-playing enables complex, collaborative workflows like software development and medical reasoning.
Personality evaluation commonly uses human psychometric tests (Big Five, MBTI) and specialized inventories (MPI).
Key engineering bottlenecks are long-context persona storage, lack of benchmarks/datasets, bias, safety, and privacy risks.
Using LLMs as evaluators is growing and can correlate better with humans than traditional metrics on some tasks.
Who Should Care
What To Try In 7 Days
Prototype a persona prompt for a concrete role (support agent, medical reviewer) and test outputs.
Build a simple 2–3 agent pipeline (planner + worker + reviewer) for a multi-step task.
Run a Big Five quick test on role-play outputs and compare to expected traits with a small human panel.
Agent Features
Memory
- retrieval-based memory
- short-term context summarization
- long-term memory (via storage/summaries)
Planning
- task decomposition
- Waterfall-like phase pipelines
- self-collaboration
Tool Use
- retrieval memory
- web navigation
- external knowledge/tools
Frameworks
- ChatDev
- MetaGPT
- AgentVerse
- Voyager
- MedAgent
- DR-CoT
- OPENCHA
- MALP
- HEALTHLLM
Is Agentic
true
Architectures
- single-agent
- multi-agent
- agent pipeline
Collaboration
- cooperative
- adversarial
- message pools
Reproducibility
Code Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Heterogeneous metrics across subfields make direct comparisons hard.
- Many tasks lack standardized datasets or format-specific benchmarks.
- Evaluations via human psychometrics may not directly transfer to LLMs.
- Survey summarizes literature but does not run unified empirical comparisons.
When Not To Use
- When legal privacy constraints forbid storing user persona data in prompts or memory.
- When strict safety or non-toxicity guarantees are required without additional safeguards.
- When you need a single, reproducible benchmarked model result (survey lacks unified benchmarks).
Failure Modes
- Bias amplification when assigning demographic personas.
- Increased toxicity or harmful outputs under certain persona prompts.
- Jailbreaking via persona modulation and multi-agent coordination.
- Personal data leakage via membership inference when storing personas.
- Persona inconsistency across turns or sessions (unstable persona fidelity)
Core Entities
Models
- ChatGPT
- Voyager
- MetaGPT
- ChatDev
- AgentVerse
- DR-CoT
- MedAgent
- OPENCHA
- MALP
- HEALTHLLM
Metrics
- Accuracy
- task success rate
- inform & success rate
- Big Five (BFI)
- MBTI
- Machine Personality Inventory (MPI)
Datasets
- WebShop
- Mind2Web
- WebArena
- VisualWebArena
- VisualWebBench
- Amazon Review
- MovieLens
- Yelp
- TripAdvisor
- MIND
- MultiWOZ
- PersonaChat
Benchmarks
- WebShop
- Mind2Web
- WebArena
- VisualWebArena
- VisualWebBench

