Macro Calibration Board · Apr 19 – May 3, 2026
Price-blind macro forecasts logged publicly with timestamps and outcomes. Each forecast is captured before PRISM sees the Kalshi price, so the comparison reflects calibration discipline — not after-the-fact tuning.
How to read · Lower Brier is better. The board scores PRISM’s price-blind probability against the Kalshi price at forecast time — measuring whether PRISM’s judgment improves on the market over time. The score is live and expected to evolve as the sample grows.
40 CLASSIFIED FORECASTS · 48 TOTAL RESOLVED · PRISM v0.9.17 → v0.9.92-deepseekPRICE-BLIND MODEL · EVIDENCE LAYER · PRIORITY COVERAGE · EVENT-FAMILY DEDUP
Early calibration window. The Brier score below is computed across 40 resolved forecasts since Apr 19, with the sample still small enough that the headline number is expected to move materially as more markets resolve. Several individual forecasts show large blind-vs-market divergences — see the resolved case studies further down.
Engine disclosure ·This BSS reflects forecasts generated on PRISM’s Anthropic Opus engine. DeepSeek V4 Flash forecasts are being logged now and will be broken out in upcoming releases.
PRISM is the challenger — this early window shows Kalshi’s price ahead on average, and we keep the ledger live so you can watch whether the blind model closes the gap over time.
Brier Skill Score -0.45 across 40 resolved blind-classified forecasts (Finance category, Apr 19 → today). Positive means the PRISM (blind)’s forecast was more accurate than the Kalshi price at forecast time. Reliability curve below shows how each forecast-probability bucket tracked realized outcomes — the closer to the diagonal, the better calibrated.
Priority 5 · Live tracker
The five macro releases PRISM treats as priority coverage — shorter cooldowns, higher refresh frequency, dedicated inclusion every cron. Each card shows the latest blind forecast vs the Kalshi price, with outcome once the market resolves.
Will **real GDP** increase by more than 2.5% in Q2 2026?
Will CPI rise more than 0.5% in May 2026?
Will the United States Producer Price Index for final demand for April 2026 be above 4.4%?
Will above 40000 jobs be added in April 2026?
ADP employment change in Apr 2026?
Where the price-blind model saw it differently
Resolved markets where the PRISM (blind)’s probability disagreed with the Kalshi price by 10 percentage points or more, and the resolution proved the PRISM (blind) right. Picked from the largest disagreements in the calibration window.
“Verified March 2026 gas prices at $3.638 require 11% spike to breach $4.025 threshold by tomorrow. Social sentiment and EIA forecasts strong…”
“Gold trades around $2700 requiring impossible 75% appreciation to $4715 in 3 hours. Market severely overpricing mathematically implausible o…”
“Federal grand jury indicted Comey with arrest warrant issued April 28, one day before deadline. Historical federal warrant execution rates e…”
By-category breakdown · n ≥ 3
Brier scores per category for resolved blind-classified forecasts in the calibration window. Categories with fewer than 3 resolutions are hidden until the sample matures.
| Category | n | Blind Brier | Market Brier | BSS |
|---|---|---|---|---|
| Finance | 40 | 0.312 | 0.216 | -0.45 |
| Culture | 17 | 0.233 | 0.191 | -0.22 |
| Politics | 11 | 0.398 | 0.167 | -1.38 |
| Tech | 5 | 0.322 | 0.155 | -1.08 |
Coming to this board
Methodology
PRISM (blind). The truth model sees the market question, resolution criteria, and retrieved evidence — but market prices are redacted from the prompt during analysis.
PRISM (anchored). The legacy variant that includes market price in its prompt. Kept running in parallel so we can measure what the market anchor does to the forecast.
Brier score. Proper scoring rule for probabilistic forecasts. 0 = perfect, 0.25 = coin flip. Lower is better.
Brier Skill Score. 1 − blind_brier / market_brier. Positive = PRISM (blind) more accurate than the Kalshi price at forecast time. Zero = matched the price. Negative = behind the price. The institutional reading: skill above the reference, normalized.
Reliability curve. Forecasts are bucketed by forecast probability (5 bins) and plotted against the realized YES rate. Perfect calibration sits on the diagonal — “when we said 70%, it happened 70% of the time.” Each point is a bucket; both PRISM (blind) and Kalshi price are plotted.
Forecast filter. Headline metrics (Brier, BSS, reliability) score only rows where the PRISM (blind) staked a real position — |capped_edge_blind| ≥ 5pp. Rows where blind landed within 5pp of the market price are tracked but excluded from headline metrics; including them would pull every metric toward the diagonal artificially.
Scope. Macro markets on Kalshi (Finance category): GDP, CPI, jobs, gold, PPI, productivity, deficit, and related series. Window opens 2026-04-19 and scores every resolved forecast from that date forward.