A single-source survey of how we test LLMs: benchmarks, gaps, and practical directions
LLM evaluations show accuracy alone is insufficient: businesses must test truthfulness, bias, tool use, and robustness to avoid legal risks, bad UX, or harmful outputs.
Key finding
Public adoption exploded: ChatGPT reached 100 million users within two months of launch.
Numbers: 100M users in two months

