Positioning spread: every benchmark, one model
| Benchmark | Task acc | Confidence | F₁ | Leans |
|---|---|---|---|---|
| SQuAD (factual recall) | 0.48 | 0.40 | 0.72 | -0.08 cautious |
| MMLU-Pro (knowledge) | 0.65 | 0.67 | 0.74 | +0.03 calibrated |
| LegalBench (legal reasoning) | 0.86 | 0.64 | 0.74 | -0.23 cautious |
| MathBench (competition math) | 0.95 | 1.00 | 0.97 | +0.05 calibrated |
| OmniMath (advanced math) | 0.58 | 0.74 | 0.75 | +0.16 overconfident |
| SciCode (scientific code) | 0.56 | 0.65 | 0.73 | +0.09 overconfident |
In the full cloud
Pairwise signal: pairs involving claude-haiku-4.5
The four metacognitive outcomes
No example in this selection.
What is the worldwide prevalence of obesity?
It said it could answer: “Yes, I can provide a reasonable answer to this question. Obesity prevalence is a well-documented public health metric tracked by organizations like the WHO and CDC. As of recent data (2020s), approximately 13% of the global adult population is obese, with rates varying significantly by region. However, the exact figure depends on the year referenced and the definition used (BMI ≥ 30). Without seeing the options, I cannot guarantee my answer matches the specific statistic the question is looking for, but I can provide an informed, evidence-based response.”
→ It but it couldn't.It answered “Worldwide Obesity Prevalence Approximately 16% of the global adult population is obese (BMI ≥ 30), representing about 1 billion people. This figure has roughly tripled since 1990. In children and adolescents, approximately 8% ar”; expected “13%”.
No example in this selection.
No example in this selection.