Brier Score

A common formulation of the Brier score is:

where $f_t$ is the probability of the forcast, $o_t$ the outcome (1 if it happens 0 otherwise) and $N$ is the number of forcasting instances. It can be thought as either a measure of the “calibration” of a set of probabilistic predictions, or as a “cost function”.

As an example, suppose a forcaster predicts the probability of rain as always 80%.

forcast_prob = 0.8

Lets’s compare the Brier scores for the forcaster when the actual probability of rain is 80% and 20%.

import numpy as np
actual_probs = np.array([0.8, 0.2])

def brierScore(preds, outcomes):
    n = float(len(preds))
    return 1 / n * np.sum((preds - outcomes)**2)

Simulate rainy days and compare Brier scores.

number_of_days = 100

def initArray(size, value):
    a = np.empty(size)
    a.fill(value)
    return a

prediction_probs = initArray(number_of_days, forcast_prob)
rainy_days = [ np.random.random_sample(number_of_days) < p for p in actual_probs]

brier_scores = [brierScore(prediction_probs, outcomes) for outcomes in rainy_days]
brier_scores
[0.13, 0.52600000000000025]

We can see that a Brier score closer to zero is a better. With 0 being the best achievable and 1 the worst.