Until the maximum likelihood estimation method finds the true parameter

Introduction

Observe that the value estimated by the maximum likelihood estimation method approaches the true value as the number of trials increases.

table of contents

  1. What is the maximum likelihood estimation method?
  2. What is a consistent estimator?
  3. Simulation with python

1. What is maximum likelihood estimation method?

When estimating the true mean of the Bernoulli distribution by the maximum likelihood estimation method, the estimated value $ \ mu_ {ML} $ is

\mu_{ML} = \frac{1}{N} \sum_{i=0}^N x_i

It looks like. The drawback of the maximum likelihood estimation method is that it overfits with a small number of trials. However, it shows that the number of trials increases and the value approaches the true value. The value that approaches the population mean or population variance as the number of trials is repeated is called the coincident estimator. The following theory shows that $ \ mu_ {ML} $ is a consistent estimator.

2. What is a consistent estimator?

For any $ \ epsilon> 0 $, $ \ hat {\ theta} _n $

\lim_{n \to \infty} P(|\hat{\theta}_n - \theta| > \epsilon) = 0

When is satisfied, $ \ hat {\ theta} _n $ is called the matching estimator of the parameter $ \ theta $.

The rough expression means when $ \ hat {\ theta} _n $ is a consistent estimator. "With an infinite number of trials, the probability that the difference between $ \ hat {\ theta} _n $ and $ \ theta $ will be greater than the very small number $ \ epsilon $ is 0."

We will show that the maximum likelihood estimator $ \ mu_ {ML} $ in the Bernoulli distribution is a consistent estimator of the population mean $ \ mu $. Chebyshev's inequality when proving to be a consistent estimator

P(|Y - E[Y]| > \epsilon) \leq \frac{V[Y]}{\epsilon^2}

It is easy to use. (Image of replacing $ Y $ with $ \ mu_ {ML} $)

$ E [u_ {ML}] $ (equivalent to Chebyshev's $ E [Y] $)

\begin{eqnarray}
E[\mu_{ML}] &=& E[\frac{1}{N}\sum_{i=0}^N x_i]\\
&=&\frac{1}{N}E[\sum_{i=0}^N x_i]\\
&=&\frac{1}{N}\sum_{i=0}^NE[x_i]\\
&=&\frac{1}{N} N u\\
&=&\mu\\
\end{eqnarray}

$ V [\ mu_ {ML}] $ is

\begin{eqnarray}
V[\mu_{ML}] &=& V[\frac{1}{N}\sum_{i=0}^N x_i]\\
&=&\frac{1}{N^2}\sum_{i=0}^NV[x_i]\\
&=&\frac{1}{N^2}N\sigma\\
&=&\frac{\sigma}{N}

\end{eqnarray}

So replace $ Y $ in the Chebyshev's inequality above with $ \ mu_ {ML} $

\begin{eqnarray}
P(|\mu_{ML} - E[\mu_{ML}]| > \epsilon) \leq \frac{V[\mu_{ML}]}{\epsilon^2} \\
&\Leftrightarrow& P(|\mu_{ML} - u]| > \epsilon) \leq \frac{1}{\epsilon^2} \frac{\sigma}{N}\\
\end{eqnarray}

Since the right side is $ N → \ infinity $, it becomes $ 0 $.

\lim_{n \to \infty} P(|\mu_{ML} - \mu| > \epsilon) = 0

Is established. So $ \ mu_ {ML} $ is a consistent estimator of $ \ mu $.

3. Simulation with python

2 shows that $ \ mu_ {ML} $ is a consistent estimator of $ \ mu $, that is, $ N → \ infinity $ shows that $ \ mu_ {ML} $ matches $ \ mu $. It was. The result of simulation with python is as follows. ml_Bernoulli.png The horizontal axis is $ N $, the vertical axis is $ \ mu_ {ML} $, and the purple line is the population mean $ u $. It's a rough prediction at first, but we've found that as $ N $ increases, the value approaches $ \ mu $. The code is listed below. Code: https://github.com/kthimuo/blog/blob/master/ml_Bernoulli_plot.py

that's all. If you have any suggestions, please comment.

Recommended Posts

Until the maximum likelihood estimation method finds the true parameter
Until the shell finds the command
Least squares method and maximum likelihood estimation method (comparison by model fitting)
Advantages and disadvantages of maximum likelihood estimation
Maximum likelihood estimation of various distributions with Pyro