This time we will implement maximum likelihood estimation of Student's t distribution. The Student's t distribution is well known as a distribution that is more robust to outliers than the Gaussian distribution, but if you remember well, this distribution has never been used, so this is a good opportunity and its robustness. I will actually check. At the time of First PRML implementation, I set a rule that only numpy is possible except for the standard library, but this time it is a function that is not so familiar as the digamma function. I also used scipy because it came out. For the second time of this project, I will use a third party package other than numpy early, and I will be reminded of the future.
Student's t distribution is Gaussian distribution
This is the E (Expectation) step of the EM algorithm. In this step, fix the parameters ($ \ mu, a, b $) you want to estimate, and calculate from what variance (or accuracy $ \ tau $) the Gaussian distribution each sample point $ x_i $ is sampled from. I will.
This is the M (Maximization) step of the EM algorithm. Compute the log-likelihood function for the complete data $ \ {x_i, \ tau_i \} $ and maximize it for the parameters $ \ mu, a, b $. For the accuracy parameter $ \ tau_i $ at this time, use the one obtained in step E.
If you look at the solutions of $ a and b $ in the first place, they are mixed with each other, so the above three equations probably do not maximize the log-likelihood function. In this way, the EM algorithm when the log-likelihood function is not completely maximized is called the generalized EM algorithm. When performing maximum likelihood estimation of the Student's t distribution, the degree of freedom parameter $ \ nu (= 2a) $ is usually fixed at some value, probably because the M step is performed exactly. If you fix the degrees of freedom parameter, the remaining estimation target will be $ \ mu, \ lambda $, and I think that you can easily maximize the log-likelihood function.
I will make some corrections and supplements based on the comments below. If it is the original E step, the accuracy parameter
\tau_i Posterior distribution ofp(\tau_i|x_i,\mu,a,b) = {\rm Gam}(\tau_i|a+{1\over2},b+{1\over2}(x_i-\mu)^2) Ends with the calculated part, and in the subsequent M step, the perfect log-likelihood function\ln p(x_i,\tau_i|\mu,a,b) のそPosterior distribution ofについての期待値\mathbb{E}[\sum_i\ln p(x_i,\tau_i|\mu,a,b)] To calculate. If the calculation result is extracted only where it is related to the parameter,-{1\over2}\sum_i\mathbb{E}\[\tau_i\](x_i-\mu)^2+a\sum_i\mathbb{E}\[\ln\tau_i\]+aN\lnb-b\sum_i\mathbb{E}[\tau_i]-N\ln\Gamma(a) ,Andtheshapeofthefunctiontobemaximizedispartiallydifferent.TheEMalgorithmintheoriginalarticleisasampleapproximationtotheexpectedvaluecalculation\mathbb{E}[\sum_i\lnp(x_i,\tau_i|\mu,a,b)]=\sum_i\lnp(x_i,\tau_i^{(sample)}|\mu,a,b) However,thereisalwaysonlyonesamplesizeeach\tau_i^{(sample)}=\mathbb{E}[\tau_i] Iwouldappreciateitifyoucouldinterpretthatasasample.Thefigurebelowseemstoworktosomeextent,sothissampleapproximationmaynotaffecttheaccuracysomuch(I'mreallyhappyifthathappens).
To summarize the above,
Use this generalized EM algorithm to find the parameters of the Student's t distribution.
import Import gamma and digamma functions from matplotlib and numpy to illustrate the results, and in addition scipy.
import matplotlib.pyplot as plt
import numpy as np
from scipy.special import gamma, digamma
Maximum likelihood estimation by Gaussian distribution only for comparison with Student's t distribution
class Gaussian(object):
def fit(self, x):
self.mean = np.mean(x)
self.var = np.var(x)
def predict_proba(self, x):
return (np.exp(-0.5 * (x - self.mean) ** 2 / self.var)
/ np.sqrt(2 * np.pi * self.var))
This is the code that performs maximum likelihood estimation based on the student's t distribution. Maximum likelihood estimation is performed by the fit method. Repeat steps E and M in it, and finish when the parameters are no longer updated.
class StudentsT(object):
def __init__(self, mean=0, a=1, b=1, learning_rate=0.01):
self.mean = mean
self.a = a
self.b = b
self.learning_rate = learning_rate
def fit(self, x):
while True:
params = [self.mean, self.a, self.b]
self._expectation(x)
self._maximization(x)
if np.allclose(params, [self.mean, self.a, self.b]):
break
def _expectation(self, x):
self.precisions = (self.a + 0.5) / (self.b + 0.5 * (x - self.mean) ** 2)
def _maximization(self, x):
self.mean = np.sum(self.precisions * x) / np.sum(self.precisions)
a = self.a
b = self.b
self.a = a + self.learning_rate * (
len(x) * np.log(b)
+ np.log(np.prod(self.precisions))
- len(x) * digamma(a))
self.b = a * len(x) / np.sum(self.precisions)
def predict_proba(self, x):
return ((1 + (x - self.mean) ** 2/(2 * self.b)) ** (-self.a - 0.5)
* gamma(self.a + 0.5)
/ (gamma(self.a) * np.sqrt(2 * np.pi * self.b)))
students_t_mle.py
import matplotlib.pyplot as plt
import numpy as np
from scipy.special import gamma, digamma
class Gaussian(object):
def fit(self, x):
self.mean = np.mean(x)
self.var = np.var(x)
def predict_proba(self, x):
return (np.exp(-0.5 * (x - self.mean) ** 2 / self.var)
/ np.sqrt(2 * np.pi * self.var))
class StudentsT(object):
def __init__(self, mean=0, a=1, b=1, learning_rate=0.01):
self.mean = mean
self.a = a
self.b = b
self.learning_rate = learning_rate
def fit(self, x):
while True:
params = [self.mean, self.a, self.b]
self._expectation(x)
self._maximization(x)
if np.allclose(params, [self.mean, self.a, self.b]):
break
def _expectation(self, x):
self.precisions = (self.a + 0.5) / (self.b + 0.5 * (x - self.mean) ** 2)
def _maximization(self, x):
self.mean = np.sum(self.precisions * x) / np.sum(self.precisions)
a = self.a
b = self.b
self.a = a + self.learning_rate * (
len(x) * np.log(b)
+ np.log(np.prod(self.precisions))
- len(x) * digamma(a))
self.b = a * len(x) / np.sum(self.precisions)
def predict_proba(self, x):
return ((1 + (x - self.mean) ** 2/(2 * self.b)) ** (-self.a - 0.5)
* gamma(self.a + 0.5)
/ (gamma(self.a) * np.sqrt(2 * np.pi * self.b)))
def main():
# create toy data including outliers and plot histogram
x = np.random.normal(size=20)
x = np.concatenate([x, np.random.normal(loc=20., size=3)])
plt.hist(x, bins=50, normed=1., label="samples")
# prepare model
students_t = StudentsT()
gaussian = Gaussian()
# maximum likelihood estimate
students_t.fit(x)
gaussian.fit(x)
# plot results
x = np.linspace(-5, 25, 1000)
plt.plot(x, students_t.predict_proba(x), label="student's t", linewidth=2)
plt.plot(x, gaussian.predict_proba(x), label="gaussian", linewidth=2)
plt.legend()
plt.show()
if __name__ == '__main__':
main()
If you run the above code, you will get the following result. As shown in Figure 2.16 of PRML, fitting by Student's t distribution is certainly robust even if there are outliers. The mean of the Student's t distribution is around 0, but the mean of the Gaussian distribution is around 2.5, pulled by outliers.
It was confirmed that the maximum likelihood estimation with the Student's t distribution is more robust than when the Gaussian distribution is modeled. If I have a chance, I will try to solve the curve regression problem using Student's t distribution.
Recommended Posts