Let's try again Maximum likelihood estimation and fitting of model (probability distribution) ① Discrete probability distribution

Introduction

Speaking of data analysis, it is the construction of a statistical model that can explain the relationships hidden in observation data. Understanding the probability distribution is essential for building statistical models. In the idea of data analysis, the data of a variable arises from a stochastic phenomenon. Conversely, what kind of probability distribution can be used to represent the observed data pattern? </ b>

Maximum likelihood estimation method

One way to do this is to apply a probability distribution. However, the shape of the probability distribution changes depending on the parameters, so it is necessary to determine appropriate parameter values based on the observed data for fitting. The maximum likelihood estimation method is a major method for calculating the parameter value.

Roughly describing the method of maximum likelihood estimation, it is assumed that the obtained data are N data Y extracted from a probability distribution with a certain parameter θ, and the extraction probability of each value is $ p (y_i | θ). ) Let the likelihood function L be the product of $ by N pieces. $L(θ|{y_i}) = p(y_1|θ) × p(y_2|θ) × ··· × p(y_N|θ)=\prod_{i=1}^{N}p(y_N|θ)$ Calculate the parameter value so that the likelihood function L is maximized. The flow is to convert (logarithmize) to a form that is easy to calculate, partially differentiate by the number of parameters, and solve analytically.

Actually try

There are two types of probability distributions, discrete type and continuous type, depending on the data. This time, we are doing maximum likelihood estimation of Poisson distribution and fitting of probability distribution as an example of discrete type probability distribution.

--Discrete probability distribution: Poisson distribution (this time) -~~ Continuous probability distribution: Normal distribution ~~ (next time)

Discrete probability distribution: Poisson distribution

The Poisson distribution is mainly used for count data. The probability function that represents the distribution is as follows. $ P (y) = \ frac {λ ^ ye ^ {-λ}} {y!} (Parameter representing the average of λ = y, y = 0,1,2,3 ...) $

The test data for estimating and fitting the model is created in advance from the probability distribution of the target so that the answers can be matched.

Python


"""Random numbers from Poisson distribution"""
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(seed=10)
#Parameter λ=2.4
poisson_values = np.random.poisson(lam=2.4, size=1000)
p_y, bin_edges, patches = plt.hist(poisson_values , bins=11, range=[-0.5, 10.5], normed=True)
y_k = 0.5*(bin_edges[1:] + bin_edges[:-1])
y_k #y
p_y #p(y)

image.png

You can see the values and numbers of the 1000 pseudo data obtained. Now, let's apply the Poisson distribution to such data and estimate its parameters with maximum likelihood.

Python


"""Parameter estimation by partial differentiation of the log-likelihood function"""
import sympy #Library for algebraic calculations (solving derivatives and equations)
#Define variables
sympy.var('λ y ')
#Likelihood p(Parameters|y)Define
f = (λ**y)*sympy.exp(-λ)/sympy.factorial(y)
#Logarithmic
logf=sympy.log(f)
#Partially differentiate f and expand the equation
pdff = sympy.expand(sympy.diff(logf, λ))

The formula pdff, which is the partial derivative of the log-likelihood with respect to the parameter λ, is $ \ frac {y_i} {λ} -1 $. Since this can be calculated as λ, which is the sum of the data and $ \ sum_ {i = 1} ^ {1000} (\ frac {y_i} {λ} -1) = 0 $.

Python


def L_sympy(f,var,values):       
    likelihood = 0 #Initial value of likelihood
    for i in np.arange(len(values)): #For the number of data
        # model output 
        # print(values[i])
        likelihood += f.subs(var,values[i]) #Substitute a value for y
        # print(likelihood)
    param = sympy.solve(likelihood, λ) #Solve the equation for λ
    # print(param)
    return param

L_sympy(pdff,"y",poisson_values)

Parameter estimator:[289/125]=2.312

The parameter λ was estimated to be 2.312. The setting of λ of this pseudo data is 2.4, so you can estimate it nicely.

If the statistical model is complex

As for the textbook method, you can directly obtain the formula that can calculate the parameters by solving the partially differentiated formula (derivative) as described above. However, in general statistical modeling, there are multiple parameters to estimate, and the probability distribution formula is more complicated, so it is not easy to solve.

Statistical model and probability distribution

In the commonly used generalized linear model (GLM), the statistical model that describes the data is represented by the form (and link function) of the probability distribution and linear prediction. 統計モデル_図01.png

The example given above just treats Y as Y ... It can be said that it is a statistical model.

Parameter estimation when the statistical model is complex

By the way, it is not easy to solve because there are multiple parameters to estimate as a general statistical model and the formula is complicated (not a linear formula like so-called $ \ frac {y} {λ} -1 $, but $ λ. Non-linear like ^ 2 + θ ^ 3 + ... $). Therefore, we use the method of solving nonlinear equations.

Python


"""Function of discrete probability distribution (probability mass function)"""
def probability_poisson(y,λ):
    from scipy.special import factorial
    # λ:Parameters
    # y:Data y: Occurs y times
    # return:Probability P of data Y(y|Parameters) 
    return (λ**y)*np.exp(-λ)/factorial(y)

"""Log-likelihood function: discrete type"""
def L_func(param,y):       
    likelihood = 0 #Initial value of likelihood
    for i in np.arange(len(y)):
        # model output 
        p = probability_poisson(y[i], param)
        # likelihood
        likelihood += -np.log(p) #Likelihood
    return likelihood 

"""Parameter estimation"""
"""
Quasi-Newton method (BFGS, L)-BFGS method: Memory can be saved in complicated cases)
"""
from scipy import optimize
x0 = [2] #Initial values of parameters
bound = [(0, None)] #Scope of optimal parameter search(min,max)

params_MLE = optimize.minimize(L_func,x0,args=(poisson_values),method='l-bfgs-b',
                       jac=None, bounds=bound, tol=None, callback=None,
                        options={'disp': None, 'maxls': 20, 'iprint': -1,
                                 'gtol': 1e-05, 'eps': 1e-08, 'maxiter': 15000,
                                 'ftol': 2.220446049250313e-09, 'maxcor': 10,
                                 'maxfun': 15000})

#Maximum likelihood estimator of parameters
print('Parameters:',params_MLE.x)
#Number of parameters
k=1
#AIC (the model with the smallest value is a good fit)
print('AIC:',params_MLE.fun*(2)+2*k)

Parameters:[2.31200054]
AIC: [3607.0081302]

Similarly, the parameter λ was estimated to be 2.312.

I want to know the parameters quickly ...

Python


"""curve_There is also a parameter calculation with fit for the time being"""
from scipy.optimize import curve_fit
parameters, cov_matrix = curve_fit(f=probability_poisson, xdata=y_k, ydata=p_y) 
print("Parameters:",parameters, "Covariance:",cov_matrix)

Parameters: [2.31643586]Covariance: [[0.00043928]]

Apply to the data and try to illustrate

Let's apply the model (Poisson distribution) to the data with the estimated parameter λ = 2.312.

Python


"""Illustration of whether it applies"""
acc_mle = probability_poisson(y_k, params_MLE.x)
plt.hist(poisson_values , bins=11, range=[-0.5, 10.5], normed=True)
plt.plot(y_k,acc_mle)

image.png

It feels good.

Recommended Posts

Let's try again Maximum likelihood estimation and fitting of model (probability distribution) ① Discrete probability distribution
Let's try again Maximum likelihood estimation and fitting of model (probability distribution) ② Continuous probability distribution
Advantages and disadvantages of maximum likelihood estimation
Least squares method and maximum likelihood estimation method (comparison by model fitting)
Example of python code for exponential distribution and maximum likelihood estimation (MLE)
Concept of Bayesian reasoning (2) ... Bayesian estimation and probability distribution
Maximum likelihood estimation implementation of topic model in python
Maximum likelihood estimation of mean and variance with TensorFlow
Machine Learning Super Introduction Probability Model and Maximum Likelihood Estimate
Maximum likelihood estimation of various distributions with Pyro