Introduction

Since I was investigating the multivariate t distribution, I will summarize it.

(The moment I wrote this article, I blew away the data and rewrote it ... I want to die ...)

Derivation of multivariate t distribution

Definition

Follow the notation of Multivariate t-distribution on wikipedia because Gugu is the first to be found. The probability density function for the multivariate t distribution is defined below.

p(\textbf{x}) = \frac{\Gamma\Bigl[{\frac{\nu+p}{2}}\Bigr]}{\Gamma\Bigl[{\frac{\nu}{2}}\Bigr] \nu^{\frac{p}{2}}\pi^{\frac{p}{2}} \det(\Sigma)^{\frac{1}{2}}} \Bigl[ 1 + \frac{1}{\nu} (\textbf{x} - {\bf \mu})^{T} {\bf \Sigma}^{-1}(\textbf{x} - {\bf \mu})\bigr]^{\frac{-(\nu+p)}{2}}. \tag{1}

theory

point

In the first place, what is the t distribution? It can be said that the t distribution is the one that considers the parameters that increase or decrease the variance of the normal distribution, assumes the gamma distribution, and eventually eliminates the integral. Here, we may assume an inverse gamma distribution and a chi-square distribution instead of a gamma distribution, but with a slight change, they are all likely to be equivalent. (I haven't tried)

This allows you to see the average behavior when you are not sure about the value of the variance.

Derivation

Consider two random variables $ V \ sim Gam (\ alpha, \ beta) $, $ X \ sim N (\ textbf {0}, {\ bf \ Sigma}) $. $ X $ is a $ d $ dimensional normal distribution. These two random variables are converted into $ Y and U $ by the functions $ f and g . $ Y = f(V,X) = \mu + \frac{1}{\sqrt{V}}X, \tag{2} $ $ U = g(V,X) = g(V)= V. \tag{3} $$

Here, $ g $ is an identity function of $ V $, $ \ mu \ in \ mathbb {R} ^ {d} $ is a d-dimensional vector representing the mean, and $ V $ is independent of $ X $. I will assume.

This $ f (\ cdot) $ is a straight line (plane) function when $ V $ is fixed, and is a one-to-one correspondence function. Therefore, the inverse function $ f ^ {-1} (\ cdot) $ exists, and $ f^{-1}(Y) = X = \sqrt{U}(Y - \mu), \tag{4} $ It becomes.

Find the distribution of $ Y $ from $ V and X $ whose distribution is already known. Since there are two random variables, $ U $ and $ Y $, the distribution of $ Y $ is obtained by integrating and eliminating $ U $ by considering the joint distribution of $ U and Y $.

Pr(u,y) = Pr(U < u, Y < y) = Pr(g(V) < u,f(X) < y) \\
= Pr(V < g^{-1}(u), X < f^{-1}(y)) \\
= Pr(V < u, X<\sqrt{u}(y - \mu)) = Pr(V < u) Pr(X < \sqrt{u}(y - \mu)). \tag{5}

Here, $ V $ and $ X $ are assumed to be independent, $ Pr(v,x) = Pr(v)Pr(x), \tag{6} $ I used.

Let $ p_ {V} and p_ {X} $ be the probability density functions of $ V $ and $ X , respectively. $ Pr(u,y) = Pr(V < u) Pr(X < \sqrt{u}(y - \mu)) = \int_{-\infty}^{u} \int_{-\infty}^{y} p_{V}(u)p_{X} (\sqrt{u}(y - \mu)) |\det(J)| dudy. \tag{7} $$

here,|\det(J)|Is the Jacobian generated by the change of variables, and the Jacobian matrixJ （(d+1)*(d+1)Matrix)

J = \begin{pmatrix} \frac{\partial x}{\partial y^{T}} & \frac{\partial x}{\partial u} \\ \frac{\partial v}{\partial y^{T}} & \frac{\partial v}{\partial u} \end{pmatrix} =
 \begin{pmatrix} \sqrt{u}I_{d} & \frac{1}{2\sqrt{u}(y-\mu)} \\ \textbf{0} & 1 \end{pmatrix}. \tag{8}

Since the Jacobian matrix $ J $ is an upper triangular matrix, its determinant is the product of diagonal elements. $ \det(J) = u^{\frac{d}{2}}. \tag{9} $ 　　 Since the content of the integral of $ (7) $ is the probability density function $ p_ {U, Y} (u, y) $ of $ Pr (u, y) $, this is calculated. From $ V \ sim Gam (\ alpha, \ beta) $, $ X \ sim N (\ textbf {0}, {\ bf \ Sigma}) $ $ p_{V}(v) = \frac{\beta^{\alpha}}{\Gamma(\alpha)} v^{\alpha-1} \exp{( - \beta v)}, \tag{10} $ $ p_{X}(\textbf{x}) = \frac{1}{(2\pi)^{\frac{d}{2}}} \frac{1}{\sqrt{\det(\Sigma)}} \exp{\Bigl( - \frac{1}{2} \textbf{x}^{T} \Sigma^{-1} \textbf{x} \Bigr)}, \tag{11} $ Therefore, $ p_{V}(u) = \frac{\beta^{\alpha}}{\Gamma(\alpha)} u^{\alpha-1} \exp{( - \beta u)} = Gam(\alpha, \beta), \tag{12} $

p_{X}(\sqrt{u}(\textbf{y} - \mu)) = \frac{1}{(2\pi)^{\frac{d}{2}}} \frac{1}{\sqrt{\det(\Sigma)}} \exp{\Bigl( - \frac{1}{2} \sqrt{u}(\textbf{y} - \mu)^{T} \Sigma^{-1} \sqrt{u}(\textbf{y} - \mu) \Bigr)} \\
= \frac{1}{(2\pi)^{\frac{d}{2}}} u^{-\frac{d}{2}} u^{\frac{d}{2}} \frac{1}{\sqrt{\det(\Sigma)}} \exp{\Bigl( - \frac{1}{2} (\textbf{y} - \mu)^{T} (u \Sigma^{-1}) (\textbf{y} - \mu) \Bigr)} \\
=  u^{-\frac{d}{2}} \frac{1}{(2\pi)^{\frac{d}{2}}}  \frac{1}{\sqrt{\det(\frac{\Sigma}{u})}} \exp{\Bigl( - \frac{1}{2} (\textbf{y} - \mu)^{T} (u \Sigma^{-1}) (\textbf{y} - \mu) \Bigr)} = u^{-\frac{d}{2}} N(\mu, \frac{\Sigma}{u}). \tag{13}

The probability density function $ p_ {U, Y} (u, y) $ of $ Pr (u, y) $ is $ p_{U,Y}(u,y) = Gam(\alpha, \beta) u^{-\frac{d}{2}} N(\mu, \frac{\Sigma}{u}) |\det(J)| = Gam(\alpha, \beta) N(\mu, \frac{\Sigma}{u}) u^{-\frac{d}{2}}u^{\frac{d}{2}} = Gam(\alpha, \beta) N(\mu, \frac{\Sigma}{u}). \tag{14} $ (Here, the gamma distribution platform is $ u \ in [0, \ infinty), d> 0 $, so the absolute value may be removed. )

By integrating $ p_ {U, Y} (u, y) $ with respect to $ u $, we get the probability density function $ p_ {Y} $ of $ Y . $ p_{Y} = \int_{0}^{\infty} p_{U,Y}(u,y)du = \int_{0}^{\infty} Gam(\alpha, \beta) N(\mu, \frac{\Sigma}{u})du \tag{15} $$

Here, for the sake of simplification of the notation, the distance of the normal distribution Mahalanobis is set to $ \ Delta ^ {2} = (\ textbf {y}-\ mu) ^ {T} \ Sigma ^ {-1} (). It is written as textbf {y}-\ mu) $.

p_{Y}(\textbf{y}) = \int_{0}^{\infty} \frac{\beta^{\alpha}}{\Gamma(\alpha)} u^{\alpha-1} \exp{( - \beta u)} u^{-\frac{d}{2}} \frac{1}{(2\pi)^{\frac{d}{2}}} \frac{1}{\sqrt{\det(\frac{\Sigma}{u})}} \exp{\Bigl( - \frac{u}{2} \Delta^{2} \Bigr)} du. \tag{16}

Extract only the part that depends on $ u $ $ p_{Y}(\textbf{y}) \propto \int_{0}^{\infty} u^{\alpha-1} \exp{( - \beta u)} u^{\frac{d}{2}} \exp{\Bigl( - \frac{u}{2} \Delta^{2} \Bigr)} du \\ = \int_{0}^{\infty} u^{\frac{d}{2}+\alpha-1} \exp{\Bigl( - u( \beta + \frac{\Delta^{2}}{2} ) \Bigr)} du. \tag{17} $

Here, if $ z = u (\ beta + \ frac {\ Delta ^ {2}} {2}) $, then $ u = z (\ beta + \ frac {\ Delta ^ {2}} {2 }) ^ {-1} $, and $ du = (\ beta + \ frac {\ Delta ^ {2}} {2}) ^ {-1} dz $. When $ u = 0 $, $ z = 0 $, $ u \ to \ infinity $, and $ z \ to \ infinty $.

\int_{0}^{\infty} z^{\frac{d}{2} + \alpha -1} ( \beta + \frac{\Delta^{2}}{2} )^{-\frac{d}{2} -\alpha +1} \exp(-z) (\beta + \frac{\Delta^{2}}{2})^{-1} dz \\ = ( \beta + \frac{\Delta^{2}}{2} )^{-\frac{d}{2} -\alpha} \int_{0}^{\infty} z^{(\frac{d}{2} + \alpha) -1} \exp(-z) dz.

From the definition of the gamma function $ p_{Y}(\textbf{y}) \propto \Bigl( \beta + \frac{\Delta^{2}}{2} \Bigr)^{-\frac{d}{2} -\alpha} \Gamma \Bigl( \frac{d}{2} + \alpha \Bigr) \tag{18} $

Considering the $ u $ -independent term of $ (17) $

p_{Y}(\textbf{y}) = \frac{\beta^{\alpha}}{\Gamma(\alpha)} \frac{1}{(2\pi)^{\frac{d}{2}}}  \frac{1}{\sqrt{\det(\Sigma)}} \Bigl( \beta + \frac{\Delta^{2}}{2} \Bigr)^{-\frac{d}{2} -\alpha}  \\
= \frac{\Gamma \Bigl( \frac{d}{2} + \alpha \Bigr)}{\Gamma(\alpha)}\frac{\beta^{\alpha}}{(2\pi)^{\frac{d}{2}}} \frac{1}{\sqrt{\det(\Sigma)}} \Bigl( \beta + \frac{\Delta^{2}}{2} \Bigr)^{-\frac{d}{2} -\alpha} \Bigl( \frac{1}{\beta} \Bigr)^{\frac{d}{2} + \alpha} \beta^{-\frac{d}{2}-\alpha} \\
= \frac{\Gamma \Bigl( \frac{d}{2} + \alpha \Bigr)}{\Gamma(\alpha)}\frac{1}{(2 \beta \pi)^{\frac{d}{2}}} \frac{1}{\sqrt{\det(\Sigma)}} \Bigl( 1 + \frac{\Delta^{2}}{2 \beta} \Bigr)^{-\frac{d}{2} -\alpha}. \tag{19}

Here, if $ \ alpha = \ beta = \ frac {\ nu} {2} $ and $ p = d $ are set, $ p_{Y}(\textbf{y}) = \frac{\Gamma \Bigl( \frac{\nu+p}{2} \Bigr)}{\Gamma \Bigl( \frac{\nu}{2} \Bigr)}\frac{1}{(\nu \pi)^{\frac{p}{2}}} \frac{1}{\sqrt{\det(\Sigma)}} \Bigl( 1 + \frac{\Delta^{2}}{\nu} \Bigr)^{-\frac{(\nu+p)}{2}}, \tag{20} $ Is in agreement with the definition of the t distribution of $ (1) $. $ \ Box $

Practice

Implementation of random number generation by Python

Since the $ (2) $ expression is a random number generation method as it is, implement this in python.

# -*- coding: utf-8 -*-
#!/usr/bin/python

import numpy as np
import matplotlib.pyplot as plt

#Set the parameters.
N = 5000
d = 10
df = 4

mean = np.zeros(d)
cov = np.eye(d)

#Random number generation from gamma distribution
V = np.random.gamma(shape=df/2., scale=df/2.,size=N)

#Random number generation from multivariate normal distribution
X = np.random.multivariate_normal(mean=mean, cov=cov, size=N)

#Convert X.
V = V.reshape((N,1))
denom = np.sqrt(np.tile(V,d))
Y = mean + X / denom


#Generate random numbers from a multivariate normal distribution for comparison.
#(Question:Is it okay for comparison to look like this? )
cov_ = ((df - 2) / float(df)) * np.corrcoef(Y.T)
X_ = np.random.multivariate_normal(mean=mean, cov=cov_, size=N)

#Plot.
fig = plt.figure()
ax = fig.add_subplot(111)
ax.hist(X_[:,0], color="blue", normed=True, label="Normal dist", bins=30)
ax.hist(Y[:,0], color="red", normed=True, label="t dist", bins=30, alpha=0.5)
ax.set_title("Comparison of normal and t distribution")
ax.legend()
fig.show()

fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(X_[:,0], X_[:,1],color="blue",label="Normal dist")
ax.scatter(Y[:,0], Y[:,1],color="red",label="t dist",alpha=0.5)
ax.set_title("scatter plot of t and normal dist")
ax.legend()
fig.show()

The variance is large and it looks like a t distribution? (I wanted to compare other than one dimension at a time, but I rewrote it and exhausted it, so this is about it ...)

Finally

I'm an atmosphere math man, so please let me know if you make a mistake. If I have time, I would like to follow the connection of various distributions.

Derivation of multivariate t distribution and implementation of random number generation by python