# Chebyshev's Inequality

## May 05, 2016

The inequality states that a random variable $X$ with mean $\mu$ and variance $\sigma^2$,

This guarantees that in any probability distribution most values are ‘close’ to the mean. At most $\frac{1}{9}$ of the values are $3$ standard deviations from the mean.

import numpy as np
import pandas as pd
size = 1000
data = [
['Gamma' , np.random.gamma(1., 2., size)],
['Normal', np.random.normal(0, 2., size)],
['Exponential', np.random.exponential(0.9, size)]]
def countNumberFromCStdOfMean(values, c):
std = np.std(values)
mean = np.mean(values)
return np.sum(np.absolute(values - mean) >= c * std)
c = 3.0
results = [[ d[0] , np.mean(d[1]), np.std(d[1]), countNumberFromCStdOfMean(d[1], c)] for d in data]
df = pd.DataFrame(
data = results)
df.columns = ['Distribution', 'Mean', 'Std', 'Number c stds from mean']
df
Distribution Mean Std 3 $\sigma$ from $\mu$ count
0 Gamma 1.888626 1.801146 20
1 Normal -0.051050 2.027534 1
2 Exponential 0.930995 0.931055 16

Chebychev’s inequality states that no more that 111 data points are expected to be 3 stds from the mean when the sample size is 1000. If we know the probability distribution, much closer bounds can be produced.