Model Calibration
How well do our predictions match reality? Calibration metrics show whether models are overconfident, underconfident, or well-calibrated.
Calibration Results
Resolved Markets
Scoring Methodology
Brier Score (Binary)
For binary yes/no predictions, we use the Brier Score:
- 0.00 — Perfect prediction (100% confident and correct)
- 0.25 — Maximally uncertain (50% prediction)
- 1.00 — Worst possible (100% confident and wrong)
Calibration Curve
A perfectly calibrated model's predictions should match outcomes:
- Events predicted at 30% should occur ~30% of the time
- Events predicted at 70% should occur ~70% of the time
We group predictions into buckets (0-10%, 10-20%, etc.) and compare predicted rates to actual outcomes.
Model Comparison
Each market receives predictions from multiple models:
Deep reasoning, handles edge cases and complex scenarios
Balanced approach, good at pattern recognition
Fast, pattern-focused, captures obvious signals
The aggregate prediction uses the median across all model runs. Model agreement is calculated as 1 minus normalized standard deviation — higher agreement suggests more confidence in the prediction.