A common formulation of the Brier score is:

where $f_t$ is the probability of the forcast, $o_t$ the outcome (1 if it happens 0 otherwise) and $N$ is the number of forcasting instances. It can be thought as either a measure of the “calibration” of a set of probabilistic predictions, or as a “cost function”.

As an example, suppose a forcaster predicts the probability of rain as always 80%.

`forcast_prob = 0.8`

Lets’s compare the Brier scores for the forcaster when the actual probability of rain is 80% and 20%.

```
import numpy as np
actual_probs = np.array([0.8, 0.2])
def brierScore(preds, outcomes):
n = float(len(preds))
return 1 / n * np.sum((preds - outcomes)**2)
```

Simulate rainy days and compare Brier scores.

```
number_of_days = 100
def initArray(size, value):
a = np.empty(size)
a.fill(value)
return a
prediction_probs = initArray(number_of_days, forcast_prob)
rainy_days = [ np.random.random_sample(number_of_days) < p for p in actual_probs]
brier_scores = [brierScore(prediction_probs, outcomes) for outcomes in rainy_days]
```

`brier_scores`

```
[0.13, 0.52600000000000025]
```

We can see that a Brier score closer to zero is a better. With 0 being the best achievable and 1 the worst.