Macro Calibration Board · Apr 19 – May 3, 2026

PUBLIC CALIBRATION LEDGER
LIVE SINCE APR 19

Price-blind macro forecasts logged publicly with timestamps and outcomes. Each forecast is captured before PRISM sees the Kalshi price, so the comparison reflects calibration discipline — not after-the-fact tuning.

How to read · Lower Brier is better. The board scores PRISM’s price-blind probability against the Kalshi price at forecast time — measuring whether PRISM’s judgment improves on the market over time. The score is live and expected to evolve as the sample grows.

41 CLASSIFIED FORECASTS · 49 TOTAL RESOLVED · PRISM v0.9.17 → v0.9.92-deepseekPRICE-BLIND MODEL · EVIDENCE LAYER · PRIORITY COVERAGE · EVENT-FAMILY DEDUP

Early calibration window. The Brier score below is computed across 41 resolved forecasts since Apr 19, with the sample still small enough that the headline number is expected to move materially as more markets resolve. Several individual forecasts show large blind-vs-market divergences — see the resolved case studies further down.

Engine disclosure ·This BSS reflects forecasts generated on PRISM’s Anthropic Opus engine. DeepSeek V4 Flash forecasts are being logged now and will be broken out in upcoming releases.

Brier Skill Score · PRISM (blind) vs Kalshi market

-0.44

KALSHI PRICE AHEAD · 41 CLASSIFIED FORECASTS

PRISM is the challenger — this early window shows Kalshi’s price ahead on average, and we keep the ledger live so you can watch whether the blind model closes the gap over time.

PRISM (blind)

0.305

Independent probability

PRISM (anchored)

0.210

Market-anchored baseline

Kalshi market

0.212

Benchmark price at forecast

Brier Skill Score -0.44 across 41 resolved blind-classified forecasts (Finance category, Apr 19 → today). Positive means the PRISM (blind)’s forecast was more accurate than the Kalshi price at forecast time. Reliability curve below shows how each forecast-probability bucket tracked realized outcomes — the closer to the diagonal, the better calibrated.

Reliability Curve · PRISM (blind) vs Kalshi

Closer to the diagonal = better calibrated. Each point is a forecast bucket; outcome frequencies on the y-axis.

--- Perfect--- PRISM (blind)--- Kalshi market

Priority 5 · Live tracker

The five macro releases PRISM treats as priority coverage — shorter cooldowns, higher refresh frequency, dedicated inclusion every cron. Each card shows the latest blind forecast vs the Kalshi price, with outcome once the market resolves.

📊

GDP

KXGDP

RESOLVES IN 32D

Will **real GDP** increase by more than 1.0% in Q2 2026?

Blind

85%

Market

81%

Edge+4.0pp

6 ANALYZED · LAST 57D AGO

🛒

CPI

KXCPI

RESOLVED

Will CPI rise more than 0.6% in May 2026?

Blind

25%

Market

20%

Edge+5.0pp

BLIND RIGHT · RESOLVED NO

7 ANALYZED · LAST 38D AGO

🏭

PPI YoY

KXUSPPIYOY

RESOLVED

Will the United States Producer Price Index for final demand for April 2026 be above 4.4%?

Blind

37%

Market

82%

Edge-45.0pp

BLIND WRONG · RESOLVED YES

1 ANALYZED · LAST 47D AGO

👷

Nonfarm Payrolls

KXPAYROLLS

RESOLVED

Will above 70000 jobs be added in May 2026?

Blind

78%

Market

58%

Edge+20.0pp

BLIND RIGHT · RESOLVED YES

23 ANALYZED · LAST 33D AGO

💼

ADP Employment

KXADP

RESOLVED

ADP employment change in May 2026?

Blind

52%

Market

71%

Edge-19.0pp

BLIND WRONG · RESOLVED NO

2 ANALYZED · LAST 33D AGO

Where the price-blind model saw it differently

Resolved markets where the PRISM (blind)’s probability disagreed with the Kalshi price by 10 percentage points or more, and the resolution proved the PRISM (blind) right. Picked from the largest disagreements in the calibration window.

BLIND RIGHT-65.0pp

Will average gas prices be above $4.025?

Blind

17%

Market

82%

Outcome

“Verified March 2026 gas prices at $3.638 require 11% spike to breach $4.025 threshold by tomorrow. Social sentiment and EIA forecasts strong…”

BLIND < MARKETRESOLVED APR 21

BLIND RIGHT-64.5pp

Will the gold close price be above 4715 USD/t.oz on Apr 23, 2026 at 5pm EDT?

Blind

Market

67%

Outcome

“Gold trades around $2700 requiring impossible 75% appreciation to $4715 in 3 hours. Market severely overpricing mathematically implausible o…”

BLIND < MARKETRESOLVED APR 23

BLIND RIGHT+61.2pp

James Comey arrested by April 29?

Blind

91%

Market

30%

Outcome

YES

“Federal grand jury indicted Comey with arrest warrant issued April 28, one day before deadline. Historical federal warrant execution rates e…”

BLIND > MARKETRESOLVED APR 29

By-category breakdown · n ≥ 3

Brier scores per category for resolved blind-classified forecasts in the calibration window. Categories with fewer than 3 resolutions are hidden until the sample matures.

Category	n	Blind Brier	Market Brier	BSS
Finance	41	0.305	0.212	-0.44
Culture	18	0.270	0.184	-0.47
Politics	12	0.365	0.155	-1.35
Tech	6	0.286	0.133	-1.15

Coming to this board

— Time-horizon split: ≤7-day vs >7-day forecast skill
— PRISM version diff: which build moved which metric

Methodology

PRISM (blind). The truth model sees the market question, resolution criteria, and retrieved evidence — but market prices are redacted from the prompt during analysis.

PRISM (anchored). The legacy variant that includes market price in its prompt. Kept running in parallel so we can measure what the market anchor does to the forecast.

Brier score. Proper scoring rule for probabilistic forecasts. 0 = perfect, 0.25 = coin flip. Lower is better.

Brier Skill Score. 1 − blind_brier / market_brier. Positive = PRISM (blind) more accurate than the Kalshi price at forecast time. Zero = matched the price. Negative = behind the price. The institutional reading: skill above the reference, normalized.

Reliability curve. Forecasts are bucketed by forecast probability (5 bins) and plotted against the realized YES rate. Perfect calibration sits on the diagonal — “when we said 70%, it happened 70% of the time.” Each point is a bucket; both PRISM (blind) and Kalshi price are plotted.

Forecast filter. Headline metrics (Brier, BSS, reliability) score only rows where the PRISM (blind) staked a real position — |capped_edge_blind| ≥ 5pp. Rows where blind landed within 5pp of the market price are tracked but excluded from headline metrics; including them would pull every metric toward the diagonal artificially.

Scope. Macro markets on Kalshi (Finance category): GDP, CPI, jobs, gold, PPI, productivity, deficit, and related series. Window opens 2026-04-19 and scores every resolved forecast from that date forward.

PUBLIC CALIBRATION LEDGERLIVE SINCE APR 19

Will average **gas prices** be above $4.025?

Will the gold close price be above 4715 USD/t.oz on Apr 23, 2026 at 5pm EDT?

James Comey arrested by April 29?

PUBLIC CALIBRATION LEDGER
LIVE SINCE APR 19

Will average gas prices be above $4.025?