← Atlas home Company / family view 2 models in the bench

Mistral

Same public data as the home page, specialized to one familyParametric view: company_family

Where Mistral sits in the cloud

-3-3-2-2-1-100112233Performance z-score within benchmark/probe →Confidence z-score →
Mistral model/condition points all other model/condition points equal relative confidence and pass rate

Disposition: where each model draws the line

0.000.250.500.751.00mistral-small-3.2-24b-instructmistral-medium-3.1pass rateimplied confidence (red gap = overconfident)

Competitive context, benchmark by benchmark

SQuAD (factual recall) · Prospective · Fβ β
#ModelFβPrecRecTask acc
1 gemini-3-pro-preview0.7850.770.800.65
2 gemini-3-flash-preview0.7820.700.880.59
3 gemini-2.5-pro0.7640.710.830.58
4 claude-sonnet-4.50.7530.780.720.59
5 gpt-4o0.7390.690.800.54
6 deepseek-r10.7330.760.700.57
7 claude-haiku-4.50.7160.790.650.48
8 mistral-medium-3.10.7030.650.770.52
9 gpt-4o-mini0.6860.560.880.45
10 gemini-2.0-flash-0010.6830.580.820.45
11 gpt-5.20.6730.840.560.62
12 gemini-2.5-flash0.6700.580.790.45
13 mistral-small-3.2-24b-instruct0.6660.570.790.44
14 llama-3.1-70b-instruct0.6430.650.640.45
15 claude-3.5-sonnet0.6430.790.540.56
16 llama-3.3-70b-instruct0.5930.640.550.45
17 qwen-2.5-72b-instruct0.5720.740.470.47
18 deepseek-chat0.5670.780.450.55
19 claude-3-haiku0.5540.550.560.40
20 qwen-2.5-coder-32b-instruct0.4760.311.000.31

The four metacognitive outcomes — Mistral

No curated cases for this selection yet — outcome-matrix extraction currently covers a sample of MMLU-Pro trials.

Pairwise signal: pairs involving Mistral

Match accuracy controls for the performance base-rate gap
Mistral pairs
35/ 171
Mistral mean tau
+0.032
All-pairs mean
+0.037
Mistral p<0.05
14(40%)
-1.0-0.50.00.51.0Pair signal: do confidence gaps rank performance gaps? (Kendall tau-b)
all model pairs (observed) base-rate-matched null calibration-preserving null Mistral pair (filled = p<0.05) Mistral mean all-pairs mean
mistral-medium-3.1 →mistral-small-3.2-24b-instruct → Compare against peers →