GPT-4 exceeds USMLE pass threshold and outperforms prior models on medical benchmarks
GPT-4 can reliably answer medical multiple-choice questions and give better-calibrated confidence scores than earlier models, making it useful for education, drafting clinical notes, and decision support prototypes—provided human oversight and validation.
Key finding
GPT-4 strongly outperforms GPT-3.5 on USMLE-style multiple-choice tests.
Numbers: USMLE Self Assessment overall: GPT-4 83.76% (zero-shot) vs GPT-3.5 49.1%

