PRML Diagram Figure 1.15 Bias in Maximum Likelihood Estimate of Gaussian Distribution

Thing you want to do

Consider a Gaussian distribution with mean $ \ mu $ and variance $ \ sigma ^ 2 $, expressed as follows.

\mathcal{N}(x\,\lvert\,\mu,\sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{1}{2\sigma^2}(x-\mu)^2\right)

Suppose you observe N data, $ x_1, x_2 \ cdots x_N $. Assuming that each data is generated independently according to the above Gaussian distribution, we estimate the values of the mean $ \ mu $ and variance $ \ sigma ^ 2 $ of the Gaussian distribution from the data.

Then evaluate how the estimate compares to the true value.

Estimating mean and variance

The expected value is represented by $ {\ mathbb E} [\ cdot] $. In other words

\mathbb{E}[x] = \mu\\
\mathbb{E}[x^2] = \mu^2 + \sigma^2\\
\mathbb{E}[(x-\mu)^2] = \sigma^2

Also, summarize the observation data

{\bf x} = (x_1,x_2\cdots x_N)^T

It is expressed as.

When N data are generated from the Gaussian distribution, the probability that they match the observed data $ {\ bf x} $ is the multiplication of the probability that each data $ x_1 \ sim x_n $ will be generated.

p({\bf x}\,\lvert\,\mu,\sigma^2) = \prod_{n=1}^N\mathcal{N}(x_n\,\lvert\,\mu,\sigma^2)

It will be. This is called "likelihood". I would like to find $ \ mu, \ sigma ^ 2 $ that maximizes this probability "likelihood", but it is difficult to calculate because it contains $ \ exp $ and $ \ prod $.

Therefore, take the logarithm of both sides to make it easier to calculate as follows.

\ln p({\bf x}\,\lvert\,\mu,\sigma^2) = \sum_{n=1}^N(x_n -\mu)^2 - \frac{N}{2}\ln{\sigma^2} - \frac{N}{2}\ln{2\pi}

$ y = \ ln (x) $ is a function that increases y as $ x $ increases, as shown below. test.jpg

So, maximizing $ p ({\ bf x} , \ lvert , \ mu, \ sigma ^ 2) $ and $ \ ln p ({\ bf x} , \ lvert , \ mu, \ Maximizing sigma ^ 2) $ has the same meaning.

By differentiating $ \ mu, \ sigma ^ 2 $ and setting it to 0, it becomes as follows.

\mu_{ML} = \frac{1}{N}\sum_{n}x_n\\
\sigma^2_{ML} = \frac{1}{N}\sum_{n}(x_n - \mu_{ML})^2

In addition, in order to distinguish the obtained value from the true value, it is set as $ \ mu_ {ML}, \ sigma ^ 2_ {ML} $.

Comparison with true value

Let the true values of mean and variance be $ \ mu, \ sigma ^ 2 $.

When estimating $$ \ mu_ {ML}, \ sigma ^ 2_ {ML} $ using various data $ {\ bf x}, $$ mu_ {ML}, \ sigma ^ 2_ {ML} Let's find out what value $ is likely to take. Specifically, consider the expected value of $ \ mu_ {ML}, \ sigma ^ 2_ {ML} $.

As for the average

\begin{align}
\mathbb{E}[\mu_{ML}] &= \frac{1}{N}\sum_{n}\mathbb{E}[x_n]\\
&=\mu
\end{align}

Will match the true average. On the other hand, regarding variance,

\begin{align}
\mathbb{E}[\sigma^2_{ML}] &= \frac{1}{N}\sum_{n}\mathbb{E}[(x_n - \mu_{ML})^2]\\
&=\frac{1}{N}\sum_{n}\left(\mathbb{E}[x_n^2] - 2\mathbb{E}[x_n\mu_{ML}] + \mathbb{E}[\mu_{ML}^2]\right)
\end{align}

Note that $ \ mu_ {ML} $ is a value calculated based on the data points.

\begin{align}
\frac{1}{N}\sum_{n}\mathbb{E}[x_n\mu_{ML}] &= \frac{1}{N}\sum_{n}\mathbb{E}[x_n\frac{1}{N}\sum_{m}x_m]\\
&=\frac{1}{N^2}\sum_{n}\sum_{m}\mathbb{E}[x_nx_m]\\
&=\frac{1}{N}\left(\sigma^2 + N\mu^2\right)
\end{align}
\begin{align}
\frac{1}{N}\sum_{n}\mathbb{E}[\mu_{ML}^2] &= \frac{1}{N}\sum_{n}\mathbb{E}[\frac{1}{N}\sum_{l}x_l\frac{1}{N}\sum_{m}x_m]\\
&=\frac{1}{N^3}\sum_{n}\sum_{l}\sum_{m}\mathbb{E}[x_lx_m]\\
&=\frac{1}{N}\left(\sigma^2 + N\mu^2\right)
\end{align}

Because it becomes

\mathbb{E}[\sigma^2_{ML}] = \frac{N-1}{N}\sigma^2

It turns out that the variance estimate is likely to be smaller than the true value. This phenomenon is called bias.

Implementation 1

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import numpy as np
import matplotlib.pyplot as plt

#Data for plot
X=np.arange(1,5,0.001)

def gaussian(x,mu,sig2):
    y=np.exp(((x-mu)**2)/(-2*sig2))/np.sqrt(2*np.pi*sig2)
    return y

#True mean / variance
mu=3
sig2=1.0

#Number of data used for one maximum likelihood estimation
N=2

#Number of maximum likelihood estimates
M=10000

#Generate M set data from true distribution with N as one set
x=np.sqrt(sig2)*np.random.randn(N,M)+mu

#Maximum likelihood estimation using N data for M sets
mu_ml=(1/N)*x.sum(0)
sig2_ml=(1/N)*((x-mu_ml)**2).sum(0)

#True distribution
plt.plot(X,gaussian(X,mu,sig2),'r',lw=15,alpha=0.5)

#Distribution obtained by averaging the results of M maximum likelihood estimation
plt.plot(X,gaussian(X,mu_ml.sum(0)/M,sig2_ml.sum(0)/M),'g')

#N for the variance of the maximum likelihood estimate/N-1x to remove bias
plt.plot(X,gaussian(X,mu_ml.sum(0)/M,sig2_ml.sum(0)/M * N/(N-1)),'b')

Execution result 1

You can see that the variance of the maximum likelihood estimation value (green line) multiplied by $ \ frac {N} {N-1} $ (blue line) and the true distribution (thick red line) match. test.png

Implementation 2

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import numpy as np
import matplotlib.pyplot as plt

#Data for plot
plotS = 1000
X = np.linspace(-1,1,plotS)

#Gaussian distribution probability density function
def gaussDist(x,mu,sig2):
    ret = np.exp(-(x-mu)**2/(2*sig2))/np.sqrt(2*np.pi*sig2)
    return ret

#True distribution
mu_r = 0
sig2_r = 0.05
Y_r = gaussDist(X,mu_r,sig2_r)

np.random.seed(10)
for i in range(3):
    plt.subplot(3,1,i+1)
    plt.ylim([0,7])

    #Training data
    N = 2
    x_t = np.random.randn(N) * np.sqrt(sig2_r) + mu_r
    plt.plot(x_t,np.zeros(x_t.shape[0]),'bo')

    #Maximum likelihood estimated distribution
    mu_ML = x_t.sum()/N
    sig2_ML = ((x_t - mu_ML)**2).sum()/N
    Y_ml = gaussDist(X,mu_ML,sig2_ML)
    plt.plot(X,Y_ml,'r')
    
    #True distribution
    plt.plot(X,Y_r,'g')

Execution result 2

It can be seen that the estimated distribution (red) tends to have a smaller variance than the true distribution (green). test.png

Recommended Posts

PRML Diagram Figure 1.15 Bias in Maximum Likelihood Estimate of Gaussian Distribution
Maximum likelihood estimation implementation of topic model in python
EM of mixed Gaussian distribution
PRML Chapter 13 Maximum Likelihood Estimating Python Implementation of Hidden Markov Models
PRML ยง3.3.1 Reproduce the convergence diagram of the parameter distribution by Bayesian linear regression
Example of python code for exponential distribution and maximum likelihood estimation (MLE)