Why does hit rate fail you on prediction markets?

Hit rate rewards agreeing with the consensus. A trader who says Yes to every Polymarket or Kalshi contract priced above 50 cents will average roughly 7 wins out of 10 over time if the markets are well calibrated. That tells you nothing about whether your probability estimates carry real information.

How is Brier Score actually calculated?

Brier Score is the squared difference between the probability you stated and the outcome that occurred. Outcomes resolve to 0 (No) or 1 (Yes). Average that across every prediction you logged and you have your mean Brier Score. Lower is better.

What counts as a good Brier Score on prediction markets?

0 is perfect, 0.25 is the coin flip (always saying 50%), 1 is confidently wrong every time. Top human superforecasters on ForecastBench score around 0.086 and the best frontier LLMs sit near 0.103. Polymarket's aggregate Brier across resolved markets is around 0.084, which is the bar a trader has to clear to claim real edge.

What does Brier Score not tell you?

Brier is one number. It does not show where your errors live in the probability range, it depends on the questions you took on, and it only works if you logged your own probability before the market resolved. Polymarket and Kalshi store the market's price, not yours.

Brier Score, plainly · Reading between the Markets

Predictive Labs @Predictive_Labs

You went 7 for 10 last month. Are you actually good? Hit rate is the wrong question. A forecaster who says ‘Yes’ to every market priced above 50 cents will average roughly 7 for 10 over time if those markets are calibrated. That tells you nothing about whether their probability estimates carry real information. Brier Score is the squared distance between your stated probability and the realised outcome, averaged across every prediction. A coin flip scores 0.25. Top human superforecasters tracked on ForecastBench sit closer to 0.09. The score punishes overconfidence (calling 95 on a market that resolves ‘No’) harder than calibrated doubt. The catch: Brier only works if you logged your own probability before resolution. Polymarket and Kalshi store the market’s price. They do not store yours. The discipline is to write down your probability before you trade. Without that record, no honest skill audit is possible.

9:30 AM · 29 Apr 2026 156 Views

In plain English

Suppose you posted 10 calls on Polymarket last month and 7 of them came in correct. A 70% strike rate sounds good. The trouble is that the number tells you almost nothing about whether you actually have skill at forecasting, or whether you got lucky on a friendly question set.

Brier Score is the metric that closes that gap. It is the simplest, oldest, most useful answer to one question every prediction-market trader needs to ask: are the probabilities I put on outcomes any good?

Why hit rate fails you

A forecaster who says ‘Yes’ to every contract priced above 50 cents on Polymarket or Kalshi will, over time, look like they win about 7 trades out of 10. Not because they have any insight. Because the prices they are agreeing with are usually well calibrated. The market did the work; the forecaster just nodded along.

That is the failure mode hit rate hides. It rewards agreeing with the consensus and punishes nothing. Two forecasters with the same 70% strike rate can be running radically different quality books. One might be calling 90 on markets that resolve 90% of the time. The other might be calling 65 on markets that resolve 70%. The first is doing real work. The second is along for the ride. Hit rate cannot tell them apart.

The fix is to stop counting wins and start measuring how close your stated probabilities were to the truth.

How Brier Score is actually calculated

The Brier Score for a single prediction is the squared difference between the probability you stated and the outcome that occurred. Outcomes on prediction markets are binary: a contract resolves to 0 (No) or 1 (Yes).

BS = (your probability − outcome)²

Average that across every prediction you ever logged and you have your mean Brier Score. Lower is better.

Worked example · 3 calls, 3 Briers

Your call Outcome Brier 0.80 → Yes (1) = 0.04 0.95 → No (0) = 0.9025 0.50 → Yes (1) = 0.25

A confident right call earns a great score. A confident wrong call is punished roughly twenty times harder than a hedged one. There is no upside to overstating your conviction.

This is the property mathematicians call strictly proper. Glenn Brier proved in 1950 (Monthly Weather Review, Vol. 78, No. 1) that under this scoring rule, the way to minimise your expected score is simply to report the probability you actually believe. Rounding off, hedging toward 50/50 to look modest, all of it gets punished in expectation. There is nowhere to hide.

What counts as a good Brier Score?

The scale runs from 0 (perfect calls every time) to 1 (confidently wrong every time). The four anchor points worth memorising:

0.00 Perfect

0.086 - 0.103 Top forecasters

0.25 Coin flip

1.00 Confidently wrong

Scale is non-linear. The skill region (0 to 0.25) is expanded so the band where real forecasters live is visible.

Top human superforecasters on ForecastBench, the leading public benchmark, score around 0.086 on its most recent leaderboard. The best frontier LLMs (CassiAI ensemble and xAI Grok 4.20 in early 2026) sit at 0.103, closing the gap with humans but not yet there. 0.25 is what you get by calling 50% on every contract, the coin flip baseline. Above 0.25 is genuinely worse than guessing.

For prediction markets specifically, Polymarket reports an aggregate Brier of around 0.084 across all resolved markets on its public accuracy page. In other words, market prices themselves are extraordinarily well calibrated. That is the bar an individual forecaster has to clear before they can call their edge real.

What Brier Score does not tell you

Brier is one number. It tells you, on average, how close your probabilities were to outcomes. It does not tell you where your errors live. A forecaster who is well calibrated at low probabilities but systematically overconfident at high ones can post a respectable Brier and still be the trader most likely to blow up. Two forecasters with identical Brier scores can be running very different books, in different directions, on different categories.

Brier also depends on the questions you took on. A forecaster who only takes long-shot tail bets sees different scores than one who works the middle of the distribution, even at the same skill level. Comparing raw Brier across very different question mixes is misleading without adjustments.

And critically, Brier only works if you logged your own probability before the market resolved. Polymarket and Kalshi store the market’s price. They do not store yours. Without your own record, the gap between your view and the market’s is hindsight, not measurement.

How this lives in nijinn

Brier Score is the headline metric on the Performance Audit Surface. nijinn captures your stated probability at the moment you size a position, locks it, and grades it against the realised outcome at resolution. The score is decomposed so you can see whether you are losing points on calibration (your numbers are systematically off in one direction) or on resolution (your numbers do not carry information). Different problems, different fixes.

The metric pairs naturally with the Calibration Curve, which shows where in the probability range your error lives, and with sizing discipline like Quarter Kelly, which translates the skill measurement into bankroll survival.

Brier’s lasting virtue is that it makes honesty cheaper than dishonesty. A trader who logs probabilities and grades them weekly has nowhere to hide a story. The distinction between a desk that knows it has edge and one that hopes it does is the whole point of the metric.