I needed to install the DAFT package for Python 2.7 for my Dockerised Jupyter Server. pip install daft in the terminal installed DAFT for the Python 3 environment. This is how to install for Python 2.7.

Open a terminal to the Jupyter container docker exec -it notebooks bash and list the Conda environments:

conda info -e

python2                  /opt/conda/envs/python2
root                  *  /opt/conda

Activate the Python 2.7 environment source activate python2, and pip install daft.

source activate python2
pip install daft

DAFT will now be available to Python 2.7.

These are notes on how to build and run a Dockerised Jupyter Notebook server on Windows. Jupyter Notebooks is an open source web application which allows people to create and share scientific computing results with each other:

The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.

Docker on Windows

Docker runs on Windows via a VM called the Docker Machine, typically in the Virtual Box hypervisor. To run Docker on Windows you will need Virtual Box and Docker Machine, these are helpfully installed by Docker Toolbox.

Running the container on Windows

Install Docker Toolbox and run the Docker QuickStart Terminal. This creates a ‘default’ Docker Machine and will configure it for use. Next clone the JupyterServer repo which is based on docker-stacks.

git clone https://github.com/bayesjumping/JupyterServer.git

In the Docker Quick Start Terminal cd into the source directory. Stop the Docker Machine and create a shared folder between Windows and Virtual Box.

docker-machine stop default
/C/Program\ Files/Oracle/VirtualBox/VBoxManage sharedfolder add default --name notebooks --hostpath `pwd` --automount
docker-machine start default

Create a folder in Docker Machine to hold the Notebooks. Synchronize this folder with the host Windows folder.

docker-machine ssh default 'mkdir notebooks'
docker-machine ssh default 'sudo mount -t vboxsf -o defaults,uid=`id -u docker`,gid=`id -g docker` notebooks /home/docker/notebooks'

Note: You may need to run this step if you restart the Docker Machine.

Create the container from the Dockerfile. This may take some time on the first install.

docker build -t bayesjumping/jupyterserver:0.1 .

Run the container.

docker-machine ssh default 'docker run -d -p 8888:8888 -v /home/docker/notebooks:/home/jovyan/work --name=notebooks bayesjumping/jupyterserver:0.1 start-notebook.sh'

Check the Docker Machine’s ip docker-machine ip defaultand use this ip with port 8888 appended to use Jupyter (i.e. http://192.168.99.100:8888/). You should have a working Jupyter server with both the Python and R kernals.

Jupyter

Kullback-Leibler Divergence

What’s it about?

Wikipedia describes Kullback-Leibler Divergence KLD:

In probability theory and information theory, the Kullback–Leibler divergence (also information divergence, information gain, relative entropy, KLIC, or KL divergence) is a measure of the difference between two probability distributions $P$ and $Q$. It is not symmetric in $P$ and $Q$. In applications, $P$ typically represents the “true” distribution of data, observations, or a precisely calculated theoretical distribution, while $Q$ typically represents a theory, model, description, or approximation of $P$.

The discrete KLD is defined as

it is the average number of bits that are wasted by encoding events from distribution $P$ based on a approximate distribution $Q$.

So what does that mean?

This is best done with an example. Say we have two distributions $f$ and $g$ both generating the outcomes of repeated coin tosses.

Distribution $f$ produces as many heads as tails. It describes the outcomes for a fair coin.

Distribution $g$ is more likely to produce heads than tails, with a frequency 3 heads out of 4 tosses. It describes a biased coin.

If we were to encode distribution $f$ in terms of $g$, the divergence is calculated by:

If we were to encode distribution $g$ in terms of $f$, the divergence is calculated by:

To compute the numerical values we can use a small Python script (warning the code below is not robust):

import math

f = {'H': 0.5, 'T' : 0.5}
g = {'H' : 0.75, 'T' : 0.25}

def kld(f, g):
    kld = 0
    for i in f.keys():
        kld += f[i] * math.log(f[i] / g[i])
    return kld / math.log(2)

yielding results

kld(f, g)
0.20751874963942185

kld(g, f)
0.18872187554086714

we can see that the KLD is not symmetric and that encoding distribution $f$ in terms of $g$ will require discarding more information. This intuitively makes sense as $f$ is less predictable than $g$.