Model Performance Leaderboards
Reasoning
Coding
GSM8K
AIME
ARC AGI
HellaSwag
MMLU
HLE
HLE Multi
CorpFin
MortgageTax
TaxEval
CaseLaw
Contract Law
Legal Bench
MASK
EnigmaEval
MultiChallenge
VISTA
IQ Test
Context Window
Costings
Reasoning (GPQA) Leaderboard
Rank
Model
Score
1
Grok 3 [Beta]
84.6%
2
Gemini 2.5 Pro
84%
3
OpenAI o3-mini
79.7%
4
Claude 3.7 Sonnet [R]
78.2%
5
OpenAI o1
75.7%
Model Security Leaderboard
Top 5 - CASI Score
CalypsoAI Security Index
Rank
Model
CASI
Executive Summary
Loading executive summary...
Deep Dive
Epoch AI - AI Benchmarking Dashboard
View Source
Epoch AI - Trends
View Source
Epoch AI - Data Insights
View Source
Vellum AI - LLM Leaderboard
View Source
Artificial Analysis - Leaderboards
View Source
CalypsoAI - Model Leaderboard
View Source
Vals.ai - Public Enterprise LLM Benchmarks
View Source
Scale AI - Leaderboard
View Source
TrackingAI - IQ Test
View Source