Positioning spread: every benchmark, one model
| Benchmark | Task acc | Confidence | F₁ | Leans |
|---|---|---|---|---|
| SQuAD (factual recall) | 0.59 | 0.55 | 0.75 | -0.04 calibrated |
| MMLU-Pro (knowledge) | 0.70 | 0.79 | 0.82 | +0.09 overconfident |
| LegalBench (legal reasoning) | 0.86 | 0.73 | 0.79 | -0.13 cautious |
| MathBench (competition math) | 0.93 | 1.00 | 0.97 | +0.06 overconfident |
| OmniMath (advanced math) | 0.57 | 0.75 | 0.72 | +0.18 overconfident |
| SciCode (scientific code) | 0.57 | 0.29 | 0.43 | -0.29 cautious |
In the full cloud
Pairwise signal: pairs involving claude-sonnet-4.5
The four metacognitive outcomes
What happens during the 'incubation period'?
It said it could answer: “The term 'incubation period' has a clear, well-established definition in medicine and biology - it refers to the time between exposure to a pathogen and the appearance of symptoms. I can provide an accurate answer without needing to see multiple choice options.”
→ It and it did.It answered “During the incubation period, a pathogen multiplies inside the host after infection but before symptoms appear.”; expected “Virus replication happens at an intracellular level”.
No example in this selection.
Find the generator for the finite field Z_11.
It said it couldn't answer: “The question asks for 'the' generator of Z_11, but Z_11 (the multiplicative group of the finite field) has multiple generators - specifically φ(φ(11)) = φ(10) = 4 generators. Without seeing the multiple choice options, I cannot determine which specific generator they want (e.g., 2, 6, 7, or 8 are all primitive roots mod 11). The question is ambiguous as stated.”
→ It but it could.It answered “A generator for Z_11 (the multiplicative group of non-zero elements) is 2. (Other generators include 6, 7, and 8)”; expected “2”.
What is the worldwide prevalence of obesity?
It said it couldn't answer: “Obesity prevalence is a well-documented global health statistic that is regularly tracked by organizations like the WHO. I can provide accurate information about worldwide obesity rates (approximately 13-16% of adults globally as of recent years, with the figure having roughly tripled since 1975). However, without seeing the multiple-choice options, I cannot be certain if the question is asking for a specific year, age group, definition threshold, or precise percentage range that would match one of the provided answers.”
→ It and it couldn't.It answered “According to WHO data, worldwide obesity prevalence in 2022 was approximately 16% of adults (over 890 million people), with overweight affecting about 43% of adults globally.”; expected “13%”.