← Anthropic · Claude Individual model view Anthropic · Claude

claude-haiku-4.5

Appears in 6 benchmarksMean lean (confidence − pass rate): +0.004/6 benchmarks lean overconfident (prospective probe)

Positioning spread: every benchmark, one model

0.000.250.500.751.00SQuAD (factual recall)MMLU-Pro (knowledge)LegalBench (legal reasoning)MathBench (competition math)OmniMath (advanced math)SciCode (scientific code)performanceconfidence (red gap = overconfident)
Performance vs. confidence for claude-haiku-4.5, per benchmark (prospective probe).
BenchmarkTask accConfidenceF₁Leans
SQuAD (factual recall)0.480.400.72-0.08 cautious
MMLU-Pro (knowledge)0.650.670.74+0.03 calibrated
LegalBench (legal reasoning)0.860.640.74-0.23 cautious
MathBench (competition math)0.951.000.97+0.05 calibrated
OmniMath (advanced math)0.580.740.75+0.16 overconfident
SciCode (scientific code)0.560.650.73+0.09 overconfident

In the full cloud

-3-3-2-2-1-100112233Performance z-score within benchmark/probe →Confidence z-score →
claude-haiku-4.5 conditions all other model/condition points equal relative confidence and pass rate

Pairwise signal: pairs involving claude-haiku-4.5

Match accuracy controls for the performance base-rate gap
claude-haiku-4.5 pairs
18/ 171
claude-haiku-4.5 mean tau
+0.058
All-pairs mean
+0.037
claude-haiku-4.5 p<0.05
9(50%)
-1.0-0.50.00.51.0Pair signal: do confidence gaps rank performance gaps? (Kendall tau-b)
all model pairs (observed) base-rate-matched null calibration-preserving null claude-haiku-4.5 pair (filled = p<0.05) claude-haiku-4.5 mean all-pairs mean

The four metacognitive outcomes

Competent claimed it could — and could

No example in this selection.

Overconfident claimed it could — but couldn’t
claude-haiku-4.5 · MMLU-Pro

What is the worldwide prevalence of obesity?

It said it could answer: “Yes, I can provide a reasonable answer to this question. Obesity prevalence is a well-documented public health metric tracked by organizations like the WHO and CDC. As of recent data (2020s), approximately 13% of the global adult population is obese, with rates varying significantly by region. However, the exact figure depends on the year referenced and the definition used (BMI ≥ 30). Without seeing the options, I cannot guarantee my answer matches the specific statistic the question is looking for, but I can provide an informed, evidence-based response.”

→ It but it couldn't.It answered “Worldwide Obesity Prevalence Approximately 16% of the global adult population is obese (BMI ≥ 30), representing about 1 billion people. This figure has roughly tripled since 1990. In children and adolescents, approximately 8% ar”; expected “13%”.

Underconfident declined — but could

No example in this selection.

Well-declined declined — and couldn’t

No example in this selection.

Rows: top = claimed it could answer · bottom = declined Columns: left = was actually correct · right = was actually wrong
vs claude-3-haiku →vs claude-3.5-sonnet →vs claude-sonnet-4.5 → Compare with anything →