Back to Forecasting

Model Calibration

How well do our predictions match reality? Calibration metrics show whether models are overconfident, underconfident, or well-calibrated.

350
Total Markets
25
Resolved
325
Active

Calibration Results

24
Markets Resolved
23
Binary (Brier)
1
Continuous (WIS)
0.167
Avg Brier Score
0.167
Brier Initial
0.167
Brier Final
+0.000
Brier Improvement
30.1
WIS Initial
30.1
WIS Final
0.0
WIS Improvement

Resolved Markets

MarketTypeInitialFinalChange
ASTSWill AST SpaceMobile conduct an additional dilutive equity or convertible offering exceeding $200M by December 31, 2026?Brier0.0900.090
CRMWill Salesforce report Q4 FY2026 revenue growth of 10% or higher (constant currency)?Brier0.5780.578
CRMWill Salesforce disclose AgentForce ARR at or above $1B by Q4 FY2026 earnings?Brier0.0530.053
CRMWill Salesforce's current RPO (cRPO) growth exceed 11% YoY in Q4 FY2026?Brier0.2120.212
CRMWill Salesforce guide FY2027 non-GAAP operating margin at 34% or higher?Brier0.0530.053
CVNAWill Carvana's Q4 2025 'Other' gross profit per unit remain above $420?Brier0.2300.230
CVNAWill Carvana's trailing-twelve-month operating cash flow to net income conversion ratio fall below 50% as of Q4 2025?Brier0.0320.032
DDOGWill Datadog's Q4 2025 year-over-year revenue growth rate fall below 25%?Brier0.1440.144
DDOGWill Datadog disclose or confirm that a single customer represents more than 5% of total revenue in Q4 2025 earnings or Investor Day?Brier0.0120.006+0.006
DDOGWill Datadog's initial FY2026 full-year revenue guidance imply growth of 22% or above?Brier0.1760.176
HIMSWill HIMS report Q4 2025 year-over-year subscriber growth below 15%?Brier0.1230.123
HIMSWill HIMS management withdraw or materially reduce its $6.5B 2030 revenue target at Q4 2025 earnings?Brier0.0780.078
HOODWill Robinhood's year-over-year total revenue growth fall below 30% in any quarter by Q2 2026?Brier0.1440.144
HOODWill Robinhood's crypto transaction revenue decline more than 30% quarter-over-quarter in Q4 2025?Brier0.0400.040
LMNDWill LMND report Q4 2025 gross loss ratio below 65%?Brier0.1160.116
LMNDWill LMND report Q4 2025 in-force premium (IFP) growth above 30% YoY?Brier0.2300.230
MOHWhat will Molina Healthcare's 2026 full-year diluted EPS guidance midpoint be?WIS30.130.1
MRNAWill Moderna report Q4 2025 year-end cash below $7.0B or issue 2026 revenue guidance below $1.5B?Brier0.1020.102
NVOWill CagriSema demonstrate >=20% body weight loss in the REDEFINE 4 trial results reported by June 30, 2026?Brier0.3840.384
REZIWill REZI report a goodwill impairment charge in its 2025 10-K?Brier0.1020.102
SNOWWill Snowflake's initial FY2027 full-year product revenue guidance imply growth below 25%?Brier0.3480.348
TWLOWill Twilio report Q4 2025 organic revenue growth below 10% YoY?Brier0.0120.012
TWLOWill Twilio guide FY2026 organic revenue growth at 10% or above?Brier0.4620.462
TWLOWill Twilio's Q4 2025 non-GAAP gross margin fall below 50%?Brier0.1160.116

vs External Consensus

1
With Consensus
0/1
Beat Consensus
-1.250
Avg Score Delta

Scoring Methodology

Brier Score (Binary)

For binary yes/no predictions, we use the Brier Score:

Brier = (probability - outcome)²
  • 0.00 — Perfect prediction (100% confident and correct)
  • 0.25 — Maximally uncertain (50% prediction)
  • 1.00 — Worst possible (100% confident and wrong)

Weighted Interval Score (Continuous)

For continuous predictions (revenue, margins, etc.), we use the Weighted Interval Score (WIS), which evaluates the full distribution:

WIS = |median - actual| + Σ(interval penalties)
Intervals: p10-p90, p25-p75 (wider intervals penalized less)
  • Rewards narrow intervals when the actual value falls within them
  • Penalizes overconfidence (narrow but wrong)
  • Lower score = better prediction

Calibration Curve

A perfectly calibrated model's predictions should match outcomes:

  • Events predicted at 30% should occur ~30% of the time
  • Events predicted at 70% should occur ~70% of the time

We group predictions into buckets (0-10%, 10-20%, etc.) and compare predicted rates to actual outcomes.

Model Comparison

Each market receives predictions from multiple models:

Opus

Deep reasoning, handles edge cases and complex scenarios

Sonnet

Balanced approach, good at pattern recognition

Haiku

Fast, pattern-focused, captures obvious signals

The aggregate prediction uses the median across all model runs. Model agreement is calculated as 1 minus normalized standard deviation — higher agreement suggests more confidence in the prediction.