Aidemy　2020/10/29

Introduction

Hello, it is Yope! I am a liberal arts student, but I was interested in the possibilities of AI, so I went to the AI-specialized school "Aidemy" to study. I would like to share the knowledge gained here with you, and I am summarizing it on Qiita. I am very happy that many people have read the previous summary article. Thank you! This is the third post of unsupervised learning. Nice to meet you.

This article is a summary of what you learned in "Aidemy" "in your own words". It may contain mistakes and misunderstandings. Please note.

What to learn this time ・ About principal component analysis ・ About kernel principal component analysis

Principal component analysis

About principal component analysis

-__ Principal component analysis __ is one of the methods to represent the original data from __ small data , that is, to summarize (compress) the data. ・ When principal component analysis is performed, __ "axis that can efficiently explain all data (first principal component axis)" __ and __ "axis that can efficiently explain data that cannot be explained by itself (second principal) Component axis) ” is created. -By using only the first principal component of this __, extra data can be discarded and the data can be compressed. -If you use principal component analysis, you can also perform dimension reduction, so you can visualize the data by dropping it in two or three dimensions, or use it for regression analysis.

Flow of principal component analysis (overview)

① Data X is standardized. (2) Calculate the __correlation matrix __ between features. (3) Find the __eigenvalues and eigenvectors __ of the correlation matrix. (4) Select k (k = number of dimensions to be compressed) from the one with the largest eigenvalue, and select the corresponding eigenvector. ⑤ Create feature transformation matrix W from the selected k eigenvectors. (6) Calculate the product of the data X and the matrix W, and obtain the data X'converted to the k dimension. (End)

Principal component analysis ① Standardization

-Standardization is to convert each feature of __ data so that the mean is 0 and the variance is 1. -By standardizing, data with different units and standards can be handled in the same way.

-Standardization is performed as follows. (Difference between data and average) ÷ standard deviation X = (X - X.mean(axis=0))/X.std(axis=0)

Principal component analysis ② Correlation matrix

-Correlation matrix is a matrix in which k × k correlation coefficients between each feature data are collected. The __correlation coefficient __ represents the strength of the linear relationship between the ___2 data, and the closer it is to 1, the stronger the tendency of a to be a positive linear function, that is, _positive correlation. It can be said that _ is strong, and the closer it is to -1, the stronger the tendency of a to be a negative linear function, that is, the stronger the __negative correlation __. ・ When the correlation coefficient is close to 0, it indicates that there is not much linear tendency.

-The calculation of the correlation matrix R is performed as follows. R = np.corrcoef(X.T)

-The transposed data X.T is passed to np.corrcoef (), which performs the correlation matrix, because if it is passed as X, the correlation matrix of the data itself (rows) will be calculated. Is. This time, I want to calculate the correlation matrix of __ "feature data (columns)", so in such a case, it should be transposed.

Principal component analysis ③ Eigenvalues and eigenvectors

-The correlation matrix R obtained in (2) is decomposed into __eigenvalues __ and eigenvectors when __ eigenvalue decomposition __ is performed. Each of these two is decomposed by the same number of dimensions as the matrix. -The eigenvectors indicate that information is concentrated in that direction in the __matrix R, and the eigenvalues indicate the degree of concentration __.

-Eigenvalue decomposition can be obtained as follows. The eigenvalues are stored in the variable eigvals, and the eigenvectors are stored in eigvecs in ascending order. eigvals, eigvecs = np.linalg.eigh(R)

Principal component analysis ④ ⑤ ⑥ Feature conversion

-Here, we will look at the procedure for converting the data dimension to an arbitrary k dimension. -Convert using k from the largest eigenvalues decomposed in ③ (④). Specifically, the feature transformation matrix W is created by concatenating the eigenvectors corresponding to each of these k eigenvalues (⑤). Finally, by multiplying this W by the data X, the data X'converted to the k dimension can be obtained (⑥).

-The method of creating the transformation matrix W is as follows. (If you want to convert to 2D) W = np.c[eigvecs[:,-1],eigvecs[:,-2]]_

-Since the product of X and W is "matrix product", it is calculated by __X.dot (W) __.

Easily perform principal component analysis (1) to (6)

-The principal component analysis can be performed by the above steps (1) to (6), but the principal component analysis can be easily performed by using the scikit-learn class called PCA.

・ Code![Screenshot 2020-10-29 12.26.14.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/9a9707d8-31fa-a2f8- 47b6-2ffde6aee689.png)

・ Code 2 (combine 3 types of wine data in 2D)![Screenshot 2020-10-29 12.29.05.png](https://qiita-image-store.s3.ap-northeast-1. amazonaws.com/0/698700/4363f1c6-abbd-6a19-dd3f-25f58873b41c.png)

・ Result![Screenshot 2020-10-29 12.30.00.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/8baa658a-b482-d492- 1ebe-960dcf6b42da.png)

Principal component analysis as a pretreatment for regression analysis

-Before performing regression analysis using LogisticRegression (), it is possible to create a more versatile model by performing principal component analysis and compressing the data.

-In the following, standardization and principal component analysis will be performed for X_train and X_test, which are divided data. Use the "StandardScaler" class for standardization and the PCA class for principal component analysis. In addition, training data and test data are processed according to a common standard. -Since train data needs to be trained, use "fit_transform ()" and use "transform ()" as it is for test data.

・ Code![Screenshot 2020-10-29 12.36.21.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/041bc221-3f6e-1675- 3ab2-5ee84508ee6e.png)

Kernel principal component analysis

What is kernel principal component analysis?

・ Machine learning such as regression analysis is premised on linear separation, so linear separation is not possible. Data cannot be handled, but such data can be handled by using __Kernel Principal Component Analysis (Kernel PCA) __, which converts non-linear separation data into linearizable data. .. -In the kernel PCA, the given N (number of data) x M (feature) data is recreated into N x M'data K with a new feature M'. This is called __kernel trick __ and K is called __kernel matrix __. -This kernel matrix K enables principal component analysis.

About kernel tricks

-To perform kernel tricks, you first need to calculate the kernel matrix K. If the original data is N (number of data) x M (feature), then K is N x N. -The kernel matrix is a matrix of the __kernel function __ that calculates the "similarity of each data pair". -There are several types of this kernel function, but this time we will look at the __Gauss kernel __ of the radial basis function (kernel). -One of the kernel functions Gaussian kernel can be calculated as follows.

#Calculate the square Euclidean distance between data
M = np.sum((X - X[:,np.newaxis])**2, axis=2)
#Compute kernel matrix using M
K = np.exp(-gamma*M)

Principal component analysis of kernel tricked data

-As mentioned above, the kernel matrix K obtained by performing the kernel trick can perform principal component analysis. -By performing principal component analysis, data X that was originally not linearly separable can be converted to linearly separable data X'.

・ Code![Screenshot 2020-10-29 12.38.10.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/06e6a936-b50b-98c2- d9e0-5c5c80514f44.png)

・ Result![Screenshot 2020-10-29 12.38.42.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/6d86ceb5-21ac-a9a8- aa8d-29176b572726.png)

Easy kernel principal component analysis

-By using the scikit-learn class called KernelPCA, kernel principal component analysis can be easily performed. -For the arguments, n_components is the number of dimensions after compression, kernel is the radial basis function (kernel), and gamma is the value of "γ" used to calculate the kernel matrix.

from sklearn.decomposition import KernelPCA
#Create a KernelPCA instance and perform principal component analysis
kpca = KernelPCA(n_components=2, kernel="rbf", gamma=15)
X_kpca = kpca.fit_transform(X)

Summary

-By compressing data (__ dimensionality reduction __) by __principal component analysis __, it is possible to draw on a plane and improve the accuracy of regression analysis. -Principal component analysis can be easily performed by calling __PCA class __. -By converting the data using the radial basis function (kernel), principal component analysis can be performed on the data that cannot be linearly separated. This makes __ linearly separable data linearly separable __ and machine learning possible. This is called __kernel principal component analysis __. -Kernel principal component analysis can be easily performed by calling __KernelPCA class __.

This time is over. Thank you for reading until the end.

Unsupervised learning 3 Principal component analysis