← Atlas home Company / family view 6 models in the bench

Google · Gemini

Same public data as the home page, specialized to one familyParametric view: company_family

Where Google · Gemini sits in the cloud

-3-3-2-2-1-100112233Performance z-score within benchmark/probe →Confidence z-score →
Google · Gemini model/condition points all other model/condition points equal relative confidence and pass rate

Disposition: where each model draws the line

0.000.250.500.751.00gemini-2.0-flash-001gemini-2.5-flashgemini-3-flash-previewgemini-2.5-progemini-3.1-pro-previewgemini-3-pro-previewpass rateimplied confidence (red gap = overconfident)

Competitive context, benchmark by benchmark

SQuAD (factual recall) · Prospective · Fβ β
#ModelFβPrecRecTask acc
1 gemini-3-pro-preview0.7850.770.800.65
2 gemini-3-flash-preview0.7820.700.880.59
3 gemini-2.5-pro0.7640.710.830.58
4 claude-sonnet-4.50.7530.780.720.59
5 gpt-4o0.7390.690.800.54
6 deepseek-r10.7330.760.700.57
7 claude-haiku-4.50.7160.790.650.48
8 mistral-medium-3.10.7030.650.770.52
9 gpt-4o-mini0.6860.560.880.45
10 gemini-2.0-flash-0010.6830.580.820.45
11 gpt-5.20.6730.840.560.62
12 gemini-2.5-flash0.6700.580.790.45
13 mistral-small-3.2-24b-instruct0.6660.570.790.44
14 llama-3.1-70b-instruct0.6430.650.640.45
15 claude-3.5-sonnet0.6430.790.540.56
16 llama-3.3-70b-instruct0.5930.640.550.45
17 qwen-2.5-72b-instruct0.5720.740.470.47
18 deepseek-chat0.5670.780.450.55
19 claude-3-haiku0.5540.550.560.40
20 qwen-2.5-coder-32b-instruct0.4760.311.000.31

The four metacognitive outcomes — Google · Gemini

No curated cases for this selection yet — outcome-matrix extraction currently covers a sample of MMLU-Pro trials.

Pairwise signal: pairs involving Google · Gemini

Match accuracy controls for the performance base-rate gap
Google · Gemini pairs
80/ 171
Google · Gemini mean tau
+0.052
All-pairs mean
+0.037
Google · Gemini p<0.05
48(60%)
-1.0-0.50.00.51.0Pair signal: do confidence gaps rank performance gaps? (Kendall tau-b)
all model pairs (observed) base-rate-matched null calibration-preserving null Google · Gemini pair (filled = p<0.05) Google · Gemini mean all-pairs mean
gemini-2.0-flash-001 →gemini-2.5-flash →gemini-2.5-pro →gemini-3-flash-preview →gemini-3-pro-preview →gemini-3.1-pro-preview → Compare against peers →