A broad third-party benchmark shows ChatGPT is a strong zero-shot performer but an unreliable reasoner and prone to hallucination
ChatGPT is a practical zero-shot workhorse: it saves time on many tasks and can replace some fine-tuned models for quick proofs of concept, but its factual and reasoning errors mean you must validate outputs before customer-facing or safety-critical use.
Key finding
ChatGPT often outperforms prior zero-shot LLMs.
Numbers: 9/13 evaluated datasets (zero-shot comparisons)

