motivation

I want to calculate the mutual information $ I (X; Y) $ of continuous random variables $ X $ and $ Y $ in Python. $ I(X;Y) = \int_Y \int_X p(x, y) \log \frac{p(x,y)}{p(x)p(y)} dx dy $

code

import numpy

def mutual_information(X, Y, bins=10):
    #Joint probability distribution p(x,y)Calculation
    p_xy, xedges, yedges = np.histogram2d(X, Y, bins=bins, density=True)
    
    # p(x)p(y)Calculation
    p_x, _ = np.histogram(X, bins=xedges, density=True)
    p_y, _ = np.histogram(Y, bins=yedges, density=True)
    p_x_y = p_x[:, np.newaxis] * p_y
    
    #dx and dy
    dx = xedges[1] - xedges[0]
    dy = yedges[1] - yedges[0]
    
    #Integral element
    elem = p_xy * np.ma.log(p_xy / p_x_y)
    #Mutual information and p(x, y), p(x)p(y)Output
    return np.sum(elem * dx * dy), p_xy, p_x_y

point

If you want to calculate the mutual information for the time being, you can use the above function. Incidentally, I will leave some important points for implementation.

`Density` of `np.histogram2d`

When I vaguely thought that the probability would be returned if density = True was set,np.sum (p_xy)did not become 1 and I was a little impatient. The point to note is that p_xy is ** probability density **, not probability.

Since $ X $ and $ Y $ are continuous variables, the histogram approximates the probability density. Therefore, if you add them together considering the width of the bin, it will be 1.

np.histogram and np.histogram2d return the probability density and bins (edges in the code). It is necessary to calculate dx and dy from this bin.

import numpy as np

N = 1000
X = np.random.normal(loc=0, scale=1, size=N)

p_x, edges = np.histogram(X, bins=10, density=True)

#If you take the sum of the probability densities without thinking, it will not be 1 as a matter of course.
print(np.sum(p_x))  #Output example: 1.580769264599771

#If you take the sum in consideration of the bin width, it becomes 1.
dx = edges[1] - edges[0]
print(np.sum(p_x * dx))  #Output example: 1.0000000000000002

Calculation of `p_x_y`

P_x_y in the code is trying to calculate $ p (x) p (y) $. Actually, I calculated it with the following code first, and it didn't work.

p_x_y = p_x * p_y

Correctly

p_x_y = p_x[:, np.newaxis] * p_y

is. In the former, p_x_y is the primary array, and in the latter, p_x_y is the secondary array.

Execution example

Execution example 1 (two sine waves)

Since they are not independent, there is a difference between $ p (x, y) $ and $ p (x) p (y) $, and the mutual information increases.

import matplotlib.pyplot as plt

#sine wave and cos wave
t = np.linspace(-5, 5, num=1000)
X = np.sin(2 * np.pi * t)
Y = np.cos(3 * np.pi * t)

#Calculation of mutual information
mi, p_xy, p_x_y = mutual_information(X, Y, bins=30)

#Result output
plt.figure(dpi=100)
ax1 = plt.subplot(121)
ax2 = plt.subplot(122)
ax1.set_title(r'$P_{XY}(x, y)$')
ax1.imshow(p_xy)
ax2.set_title(r'$P_{X}(x) P_{Y}(y)$')
ax2.imshow(p_x_y)
plt.suptitle('MI = {}'.format(mi))
plt.show()

Execution example 2 (independent normal distribution)

When the two variables are independent, $ p (x, y) $ and $ p (x) p (y) $ match, and the mutual information becomes small.

import matplotlib.pyplot as plt
#Two independent normal distributions
N = 10000
X = np.random.normal(size=N)
Y = np.random.normal(size=N)

#Calculation of mutual information
mi, p_xy, p_x_y = mutual_information(X, Y, bins=30)
Execution example
#Result output
plt.figure(dpi=100)
ax1 = plt.subplot(121)
ax2 = plt.subplot(122)
ax1.set_title(r'$P_{XY}(x, y)$')
ax1.imshow(p_xy)
ax2.set_title(r'$P_{X}(x) P_{Y}(y)$')
ax2.imshow(p_x_y)
plt.suptitle('MI = {}'.format(mi))
plt.show()

Calculation of mutual information (continuous value) with numpy