🚀 We're looking for ML Engineers and Medical Reviewers! Join the OpenPHR Mission →

Model Evaluation Leaderboards

Objective benchmarking of the top medical foundation models on clinical reasoning tasks. Sort by benchmark to discover the state-of-the-art.

Model ↕ Access ↕ Parameters ↕ PubMedQA (%) ↕ MedQA-USMLE (%) ↕ MedMCQA (%) ↕
Med-PaLM 2
Google
🔒 Closed API Unknown 81.8 86.5 72.3
GPT-4
OpenAI
🔒 Closed API Unknown 80.4 81.4 73.0
Clinical Llama-3 (8B)
Open Source
🔓 Open Weights 8B 78.2 74.5 68.9
MedAlpaca (13B)
Open Source
🔓 Open Weights 13B 76.5 60.2 58.7
BioGPT-Large
Microsoft
🔓 Open Weights 1.5B 81.0 50.5 45.2
ClinicalBERT
MIT
🔓 Open Weights 110M 65.0 45.3 42.1