Model Performance Leaderboards

Reasoning (GPQA) Leaderboard
RankModelScore
1Grok 3 [Beta]84.6%
2Gemini 2.5 Pro84%
3OpenAI o3-mini79.7%
4Claude 3.7 Sonnet [R]78.2%
5OpenAI o175.7%

Model Security Leaderboard

Top 5 - CASI Score
CalypsoAI Security Index
RankModelCASI

Executive Summary

Loading executive summary...