Model Calibration
How well do our predictions match reality? Calibration metrics show whether models are overconfident, underconfident, or well-calibrated.
Calibration Results
Calibration Curve
Predicted probability vs. actual outcome rate across 159 resolved markets. Points on the diagonal line indicate perfect calibration.
Bucket details
| Bucket | Count | Avg Predicted | Actual Rate | Gap |
|---|---|---|---|---|
| 0–10% | 5 | 6.6% | 0.0% | -6.6pp |
| 10–20% | 13 | 13.2% | 15.4% | +2.2pp |
| 20–30% | 21 | 23.7% | 28.6% | +4.8pp |
| 30–40% | 14 | 34.5% | 35.7% | +1.2pp |
| 40–50% | 14 | 42.6% | 57.1% | +14.6pp |
| 50–60% | 34 | 55.5% | 64.7% | +9.2pp |
| 60–70% | 28 | 65.6% | 78.6% | +13.0pp |
| 70–80% | 15 | 75.3% | 86.7% | +11.4pp |
| 80–90% | 12 | 83.4% | 91.7% | +8.3pp |
| 90–100% | 3 | 90.7% | 100.0% | +9.3pp |
Resolved Markets
Scoring Methodology
Brier Score (Binary)
For binary yes/no predictions, we use the Brier Score:
- 0.00 — Perfect prediction (100% confident and correct)
- 0.25 — Maximally uncertain (50% prediction)
- 1.00 — Worst possible (100% confident and wrong)
Calibration Curve
A perfectly calibrated model's predictions should match outcomes:
- Events predicted at 30% should occur ~30% of the time
- Events predicted at 70% should occur ~70% of the time
We group predictions into buckets (0-10%, 10-20%, etc.) and compare predicted rates to actual outcomes.
Model Comparison
Each market receives predictions from multiple models:
Deep reasoning, handles edge cases and complex scenarios
Balanced approach, good at pattern recognition
Fast, pattern-focused, captures obvious signals
The aggregate prediction uses the median across all model runs. Model agreement is calculated as 1 minus normalized standard deviation — higher agreement suggests more confidence in the prediction.