Chain-of-Utterances prompts reliably jailbreak LLMs; fine-tuning on curated safe conversations reduces harm.
CoU-style prompts can bypass deployed guardrails often; test and harden public-facing LLMs, or fine-tune smaller models on curated safe conversations to reduce harmful outputs without losing much utility.
Key finding
RED-EVAL jailbreaks widely deployed closed-source APIs frequently.
Numbers: GPT-4 ASR 0.651; ChatGPT ASR 0.728 on tested harmful prompts

