Principal component analysis

Comparison of probabilistic principal component analysis, Bayesian principal component analysis, and kernel principal component analysis, which are extensions of principal component analysis.

Principal component analysis (PCA)

How to reduce high-dimensional data to low-dimensional data There are various ways to find it, but it is quick to interpret it as singular value decomposition.

X = UDV^T

$ X $: Original data of number of samples x number of dimensions
$ U $: × Unitary matrix of dimensions
$ D $: Diagonal matrix of number of dimensions x number of dimensions (diagonal components are eigenvalues)
$ V ^ T $: eigenvector matrix of number of dimensions x number of dimensions (eigenvector with one row)

Further dimensionality reduction vector

X_{pca} = XV_{pca}

Can be obtained with. However, $ V_ {pca} $ is created from the number of dimensions reduced from the matrix V. (If the dimension is reduced to two dimensions, $ V_ {pca} = V [:, [0,1]] $)

Probabilistic Principal Component Analysis (Probabilistic PCA)

Probabilistic dimensionality reduction using Gaussian distribution There are multiple ways to find it, but when finding it with the EM algorithm, At E-step

M = W^TW+\sigma^2I \\
E[z_n] = M^{-1}W^T(x_n-\bar{x}) \\
E[z_{n}z_{n}^T]=\sigma^2M^{-1}+E[z_n]E[z_n]^T

However,

$ M $: Matrix of dimensions after reduction x number of dimensions after reduction
$ W $: Matrix of original dimensions x reduced dimensions (randomly initialized)
$ \ sigma ^ 2 $: Scalar
$ I $: Identity matrix
$ E [z_n] $: Vector after dimensionality reduction of nth data
$ x_n $: Vector before dimensionality reduction of nth data
$ \ bar {x} $: Average vector of data before dimensionality reduction
$ E [z_nz_n ^ T] $: Matrix of dimensions after reduction x number of dimensions after reduction

In M-step

W = \bigl[\sum_{n=1}^{N}(x_n-\bar{x})E[z_n]^T\bigr]\bigl[\sum_{n=1}^{N}E[z_nz_n^T]\bigr]^{-1}\\
\sigma^{2} = \frac{1}{ND}\sum_{n=1}^{N}\bigl\{||x_n-\bar{x}||^2 - 2E[z_n]^TW^T(x_n-\bar{x}) + Tr(E[z_nz_n^T]W^TW)\bigr\}

However,

N: Number of data
D: Original number of dimensions

Can be obtained with.

Bayesian Principal Component Analysis (Bayes PCA)

Bayesian estimation is performed by introducing hyperparameters into the Gaussian distribution.

Compared to the case of Probabilistic PCA, the M-step is different,

\alpha_i = \frac{D}{w_i^Tw_i} \\
W = \bigl[\sum_{n=1}^{N}(x_n-\bar{x})E[z_n]^T\bigr]\bigl[\sum_{n=1}^{N}E[z_nz_n^T] + \sigma^2A \bigr]^{-1}\\
\sigma^{2} = \frac{1}{ND}\sum_{n=1}^{N}\bigl\{||x_n-\bar{x}||^2 - 2E[z_n]^TW^T(x_n-\bar{x}) + Tr(E[z_nz_n^T]W^TW)\bigr\}

However,

$ w_i $: Vector on row i of matrix W
A: diag(\alpha_i)

Is.

Kernel principal component analysis

Principal component analysis is performed after converting a matrix of number of data x number of dimensions into a matrix of number of data x number of data by the kernel.

\tilde{K} = K - 1_{N}K - K1_N+1_NK1_N

However,

$ K : Matrix with kernel ( x_i $, $ x_j $) components
1_ {N}: Number of data with all components $ 1 / N $ x number of data matrix

For $ \ tilde {K} $ obtained in this way, dimension reduction is performed by finding the eigenvalues and eigenvectors, as in the case of principal component analysis.

Experiment

Dimensionality reduction is performed using principal component analysis (PCA), probabilistic principal component analysis (PPCA), Bayesian principal component analysis (BPCA), and kernel principal component analysis (KPCA).

The data used is iris data (data of 3 types of plants are represented by 4-dimensional vectors, and there are 50 data for each type).

The code is here https://github.com/kenchin110100/machine_learning

The figure below is a plot after reducing the dimensions to two dimensions.

PCA
PPCA
BPCA
KPCA

The boundaries between types can be clearly seen with PPCA and BPCA than with PCA. KPCA feels different, but it certainly has plots for each type.

At the end

Four types of principal component analysis were performed, and it seems to be easy to use per BPCA. There are two axes of probabilistic calculation and kernel as an extension method of PCA. There seems to be the strongest principal component analysis that combines them ...