Machine Learning Super Introduction Probability Model and Maximum Likelihood Estimate


This series is described as my personal learning and its memorandum, but I am posting it with the hope that I can share what I have learned with you. We mainly organize terms that appear while studying machine learning and deep learning. This time, we will summarize the outline of the probability model and maximum likelihood estimation that appear in the machine learning model.

Probabilistic model

A probabilistic model is a model that assumes that the variable x is generated from a probability distribution `` `P (x | θ) ``` with the parameter θ.

Probabilistic model

x ~ P(x|\theta)
Example) Normal distribution

If x is a continuous variable, it has a normal distribution.

normal distribution

N(x|\mu, \sigma^2) = \frac{1}{\sqrt{2\pi \sigma^2}} exp \begin{bmatrix} - \frac{(x-\mu)^2}{2\sigma^2} \end{bmatrix}
Example) Bernoulli distribution

Discrete variables, especially those that take 0 or 1 such as love toss, are called Bernoulli distributions.

Bernoulli distribution

B(x|p) = p^x(1-p)^{1-x}


Given some mutually independent N data X = (x0, x1, ...), if the product of the values of the probability functions of each data is a function of θ, this is the theta likelihood It becomes more like) and is called likelihood (Likelihood).


L(\theta) = \prod_{n}P(x_n|\theta)

Likelihood is the most important quantity in the probability model, and finding the parameter θ that maximizes the likelihood is called maximum likelihood estimation (MLE). Normally, it is treated in the form of log-likelihood as shown below because of the ease of calculation.

Log likelihood

lnL(\theta) = \sum_nlnP(x_n|\theta)
Example) Maximum likelihood estimation of the expected value parameter μ of the normal distribution

It is obtained by partially differentiating the log-likelihood with respect to μ and solving the equation whose value becomes 0 (as a result, the maximum likelihood estimation of the expected value parameter μ is the mean value of all x).

Maximum likelihood estimation of the expected value parameter μ of the normal distribution

lnL(\theta) = - \frac{N}{2}ln2\pi\sigma^2 - \frac{1}{2\sigma^2}\sum_n(x_n-\mu)^2\\
\frac{\delta}{\delta_p}lnL(\theta) = - \frac{1}{\sigma^2}\sum_n(x_n - \mu) = 0 \\
\mu = \frac{1}{N}\sum_nx_n = \bar{x} 
Example) Maximum likelihood estimation of p of Bernoulli distribution

Similarly, for the Bernoulli distribution, the maximum likelihood estimation of p is solved as follows. Here, if the number of x = 1 is M

Maximum likelihood estimation of Bernoulli distribution

\sum_nx_n = M \\
lnL(\theta) = \sum_nx_nlnp + (1 - x_n)ln (1 - p) \\
=Nlnp + (N - M)ln(1 - p) \\
\frac{\delta}{\delta_p}lnL(\theta) = - \frac{M}{p} + \frac{N -M}{1 -p} = 0 \\
p = \frac{M}{N}

And p is the ratio of the number of times x = 1.

in conclusion

In this series, I will try to suppress only the necessary parts with such a voluminous feeling. Next time, I will summarize the stochastic gradient descent method, so please take a look there as well. Thank you for browsing to the end.

Recommended Posts

Machine Learning Super Introduction Probability Model and Maximum Likelihood Estimate
Super introduction to machine learning
[Super Introduction to Machine Learning] Learn Pytorch tutorials
[Super Introduction to Machine Learning] Learn Pytorch tutorials
Let's try again Maximum likelihood estimation and fitting of model (probability distribution) ① Discrete probability distribution
Let's try again Maximum likelihood estimation and fitting of model (probability distribution) ② Continuous probability distribution
Introduction to machine learning
Machine learning model considering maintainability
An introduction to machine learning
Machine learning and mathematical optimization
Machine learning to learn with Nogizaka46 and Keyakizaka46 Part 1 Introduction
Learning model creation, learning and reasoning
Least squares method and maximum likelihood estimation method (comparison by model fitting)
Significance of machine learning and mini-batch learning
Classification and regression in machine learning
Inversely analyze a machine learning model
Organize machine learning and deep learning platforms
Introduction to Machine Learning Library SHOGUN
[Machine learning] Summary and execution of model evaluation / indicators (w / Titanic dataset)
[Machine learning] OOB (Out-Of-Bag) and its ratio
Introduction to Machine Learning: How Models Work
Advantages and disadvantages of maximum likelihood estimation
Introduction to Deep Learning ~ Convolution and Pooling ~
An introduction to OpenCV for machine learning
Personal notes and links about machine learning ① (Machine learning)
Introduction to ClearML-Easy to manage machine learning experiments-
<Course> Machine Learning Chapter 1: Linear Regression Model
Cross Validation improves machine learning model accuracy
Machine learning algorithm classification and implementation summary
Python and machine learning environment construction (macOS)
<Course> Machine Learning Chapter 2: Nonlinear Regression Model
An introduction to Python for machine learning
"OpenCV-Python Tutorials" and "Practical Machine Learning System"
[Super Introduction] Machine learning using Python-From environment construction to implementation of simple perceptron-
List of main probability distributions used in machine learning and statistics and code in python
Python learning notes for machine learning with Chainer Chapters 11 and 12 Introduction to Pandas Matplotlib