A common formulation of the Brier score is:
where $f_t$ is the probability of the forcast, $o_t$ the outcome (1 if it happens 0 otherwise) and $N$ is the number of forcasting instances. It can be thought as either a measure of the “calibration” of a set of probabilistic predictions, or as a “cost function”.
As an example, suppose a forcaster predicts the probability of rain as always 80%.
forcast_prob = 0.8
Lets’s compare the Brier scores for the forcaster when the actual probability of rain is 80% and 20%.
import numpy as np actual_probs = np.array([0.8, 0.2]) def brierScore(preds, outcomes): n = float(len(preds)) return 1 / n * np.sum((preds - outcomes)**2)
Simulate rainy days and compare Brier scores.
number_of_days = 100 def initArray(size, value): a = np.empty(size) a.fill(value) return a prediction_probs = initArray(number_of_days, forcast_prob) rainy_days = [ np.random.random_sample(number_of_days) < p for p in actual_probs] brier_scores = [brierScore(prediction_probs, outcomes) for outcomes in rainy_days]
We can see that a Brier score closer to zero is a better. With 0 being the best achievable and 1 the worst.