Macro Calibration Board · Apr 19 – May 3, 2026

PUBLIC CALIBRATION LEDGER
LIVE SINCE APR 19

Price-blind macro forecasts logged publicly with timestamps and outcomes. Each forecast is captured before PRISM sees the Kalshi price, so the comparison reflects calibration discipline — not after-the-fact tuning.

How to read · Lower Brier is better. The board scores PRISM’s price-blind probability against the Kalshi price at forecast time — measuring whether PRISM’s judgment improves on the market over time. The score is live and expected to evolve as the sample grows.

40 CLASSIFIED FORECASTS · 48 TOTAL RESOLVED · PRISM v0.9.17 → v0.9.92-deepseekPRICE-BLIND MODEL · EVIDENCE LAYER · PRIORITY COVERAGE · EVENT-FAMILY DEDUP

Early calibration window. The Brier score below is computed across 40 resolved forecasts since Apr 19, with the sample still small enough that the headline number is expected to move materially as more markets resolve. Several individual forecasts show large blind-vs-market divergences — see the resolved case studies further down.

Engine disclosure ·This BSS reflects forecasts generated on PRISM’s Anthropic Opus engine. DeepSeek V4 Flash forecasts are being logged now and will be broken out in upcoming releases.

Brier Skill Score · PRISM (blind) vs Kalshi market
-0.45
KALSHI PRICE AHEAD · 40 CLASSIFIED FORECASTS

PRISM is the challenger — this early window shows Kalshi’s price ahead on average, and we keep the ledger live so you can watch whether the blind model closes the gap over time.

PRISM (blind)
0.312
Independent probability
PRISM (anchored)
0.213
Market-anchored baseline
Kalshi market
0.216
Benchmark price at forecast

Brier Skill Score -0.45 across 40 resolved blind-classified forecasts (Finance category, Apr 19 → today). Positive means the PRISM (blind)’s forecast was more accurate than the Kalshi price at forecast time. Reliability curve below shows how each forecast-probability bucket tracked realized outcomes — the closer to the diagonal, the better calibrated.

Reliability Curve · PRISM (blind) vs Kalshi
Closer to the diagonal = better calibrated. Each point is a forecast bucket; outcome frequencies on the y-axis.
--- Perfect--- PRISM (blind)--- Kalshi market

Priority 5 · Live tracker

The five macro releases PRISM treats as priority coverage — shorter cooldowns, higher refresh frequency, dedicated inclusion every cron. Each card shows the latest blind forecast vs the Kalshi price, with outcome once the market resolves.

📊
GDP
KXGDP
RESOLVES IN 80D

Will **real GDP** increase by more than 2.5% in Q2 2026?

Blind
27%
Market
35%
Edge-8.0pp
6 ANALYZED · LAST 8D AGO
🛒
CPI
KXCPI
RESOLVES IN 30D

Will CPI rise more than 0.5% in May 2026?

Blind
43%
Market
43%
Edge+0.0pp
7 ANALYZED · LAST 8D AGO
🏭
PPI YoY
KXUSPPIYOY
RESOLVES IN 2D

Will the United States Producer Price Index for final demand for April 2026 be above 4.4%?

Awaiting first PRISM (blind) analysis on this release.
0 ANALYZED · NO FORECAST YET
👷
Nonfarm Payrolls
KXPAYROLLS
RESOLVED

Will above 40000 jobs be added in April 2026?

Blind
100%
Market
64%
Edge+36.0pp
BLIND RIGHT · RESOLVED YES
14 ANALYZED · LAST 3D AGO
💼
ADP Employment
KXADP
RESOLVED

ADP employment change in Apr 2026?

Blind
27%
Market
23%
Edge+4.0pp
BLIND RIGHT · RESOLVED NO
1 ANALYZED · LAST 13D AGO

Where the price-blind model saw it differently

Resolved markets where the PRISM (blind)’s probability disagreed with the Kalshi price by 10 percentage points or more, and the resolution proved the PRISM (blind) right. Picked from the largest disagreements in the calibration window.

BLIND RIGHT-65.0pp

Will average **gas prices** be above $4.025?

Blind
17%
Market
82%
Outcome
NO

Verified March 2026 gas prices at $3.638 require 11% spike to breach $4.025 threshold by tomorrow. Social sentiment and EIA forecasts strong…

BLIND < MARKETRESOLVED APR 21
BLIND RIGHT-64.5pp

Will the gold close price be above 4715 USD/t.oz on Apr 23, 2026 at 5pm EDT?

Blind
3%
Market
67%
Outcome
NO

Gold trades around $2700 requiring impossible 75% appreciation to $4715 in 3 hours. Market severely overpricing mathematically implausible o…

BLIND < MARKETRESOLVED APR 23
BLIND RIGHT+61.2pp

James Comey arrested by April 29?

Blind
91%
Market
30%
Outcome
YES

Federal grand jury indicted Comey with arrest warrant issued April 28, one day before deadline. Historical federal warrant execution rates e…

BLIND > MARKETRESOLVED APR 29

By-category breakdown · n ≥ 3

Brier scores per category for resolved blind-classified forecasts in the calibration window. Categories with fewer than 3 resolutions are hidden until the sample matures.

CategorynBlind BrierMarket BrierBSS
Finance400.3120.216-0.45
Culture170.2330.191-0.22
Politics110.3980.167-1.38
Tech50.3220.155-1.08

Coming to this board

  • — Time-horizon split: ≤7-day vs >7-day forecast skill
  • — PRISM version diff: which build moved which metric

Methodology

PRISM (blind). The truth model sees the market question, resolution criteria, and retrieved evidence — but market prices are redacted from the prompt during analysis.

PRISM (anchored). The legacy variant that includes market price in its prompt. Kept running in parallel so we can measure what the market anchor does to the forecast.

Brier score. Proper scoring rule for probabilistic forecasts. 0 = perfect, 0.25 = coin flip. Lower is better.

Brier Skill Score. 1 − blind_brier / market_brier. Positive = PRISM (blind) more accurate than the Kalshi price at forecast time. Zero = matched the price. Negative = behind the price. The institutional reading: skill above the reference, normalized.

Reliability curve. Forecasts are bucketed by forecast probability (5 bins) and plotted against the realized YES rate. Perfect calibration sits on the diagonal — “when we said 70%, it happened 70% of the time.” Each point is a bucket; both PRISM (blind) and Kalshi price are plotted.

Forecast filter. Headline metrics (Brier, BSS, reliability) score only rows where the PRISM (blind) staked a real position — |capped_edge_blind| ≥ 5pp. Rows where blind landed within 5pp of the market price are tracked but excluded from headline metrics; including them would pull every metric toward the diagonal artificially.

Scope. Macro markets on Kalshi (Finance category): GDP, CPI, jobs, gold, PPI, productivity, deficit, and related series. Window opens 2026-04-19 and scores every resolved forecast from that date forward.