Aidemy 2020/10/29
Hello, it is Yope! I am a liberal arts student, but I was interested in the possibilities of AI, so I went to the AI-specialized school "Aidemy" to study. I would like to share the knowledge gained here with you, and I am summarizing it on Qiita. I am very happy that many people have read the previous summary article. Thank you! This is the third post of unsupervised learning. Nice to meet you.
What to learn this time ・ About principal component analysis ・ About kernel principal component analysis
-__ Principal component analysis __ is one of the methods to represent the original data from __ small data , that is, to summarize (compress) the data. ・ When principal component analysis is performed, __ "axis that can efficiently explain all data (first principal component axis)" __ and __ "axis that can efficiently explain data that cannot be explained by itself (second principal) Component axis) ” is created. -By using only the first principal component of this __, extra data can be discarded and the data can be compressed. -If you use principal component analysis, you can also perform dimension reduction, so you can visualize the data by dropping it in two or three dimensions, or use it for regression analysis.
① Data X is standardized. (2) Calculate the __correlation matrix __ between features. (3) Find the __eigenvalues and eigenvectors __ of the correlation matrix. (4) Select k (k = number of dimensions to be compressed) from the one with the largest eigenvalue, and select the corresponding eigenvector. ⑤ Create feature transformation matrix W from the selected k eigenvectors. (6) Calculate the product of the data X and the matrix W, and obtain the data X'converted to the k dimension. (End)
-Standardization is to convert each feature of __ data so that the mean is 0 and the variance is 1. -By standardizing, data with different units and standards can be handled in the same way.
-Standardization is performed as follows. (Difference between data and average) ÷ standard deviation X = (X - X.mean(axis=0))/X.std(axis=0)
-Correlation matrix is a matrix in which k × k correlation coefficients between each feature data are collected. The __correlation coefficient __ represents the strength of the linear relationship between the ___2 data, and the closer it is to 1, the stronger the tendency of a to be a positive linear function, that is, _positive correlation. It can be said that _ is strong, and the closer it is to -1, the stronger the tendency of a to be a negative linear function, that is, the stronger the __negative correlation __. ・ When the correlation coefficient is close to 0, it indicates that there is not much linear tendency.
-The calculation of the correlation matrix R is performed as follows. R = np.corrcoef(X.T)
-The transposed data X.T is passed to np.corrcoef (), which performs the correlation matrix, because if it is passed as X, the correlation matrix of the data itself (rows) will be calculated. Is. This time, I want to calculate the correlation matrix of __ "feature data (columns)", so in such a case, it should be transposed.
-The correlation matrix R obtained in (2) is decomposed into __eigenvalues __ and eigenvectors when __ eigenvalue decomposition __ is performed. Each of these two is decomposed by the same number of dimensions as the matrix. -The eigenvectors indicate that information is concentrated in that direction in the __matrix R, and the eigenvalues indicate the degree of concentration __.
-Eigenvalue decomposition can be obtained as follows. The eigenvalues are stored in the variable eigvals, and the eigenvectors are stored in eigvecs in ascending order. eigvals, eigvecs = np.linalg.eigh(R)
-Here, we will look at the procedure for converting the data dimension to an arbitrary k dimension. -Convert using k from the largest eigenvalues decomposed in ③ (④). Specifically, the feature transformation matrix W is created by concatenating the eigenvectors corresponding to each of these k eigenvalues (⑤). Finally, by multiplying this W by the data X, the data X'converted to the k dimension can be obtained (⑥).
-The method of creating the transformation matrix W is as follows. (If you want to convert to 2D) W = np.c[eigvecs[:,-1],eigvecs[:,-2]]_
-Since the product of X and W is "matrix product", it is calculated by __X.dot (W) __.
-The principal component analysis can be performed by the above steps (1) to (6), but the principal component analysis can be easily performed by using the scikit-learn class called PCA.
・ Code![Screenshot 2020-10-29 12.26.14.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/9a9707d8-31fa-a2f8- 47b6-2ffde6aee689.png)
・ Code 2 (combine 3 types of wine data in 2D)![Screenshot 2020-10-29 12.29.05.png](https://qiita-image-store.s3.ap-northeast-1. amazonaws.com/0/698700/4363f1c6-abbd-6a19-dd3f-25f58873b41c.png)
・ Result![Screenshot 2020-10-29 12.30.00.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/8baa658a-b482-d492- 1ebe-960dcf6b42da.png)
-Before performing regression analysis using LogisticRegression (), it is possible to create a more versatile model by performing principal component analysis and compressing the data.
-In the following, standardization and principal component analysis will be performed for X_train and X_test, which are divided data. Use the "StandardScaler" class for standardization and the PCA class for principal component analysis. In addition, training data and test data are processed according to a common standard. -Since train data needs to be trained, use "fit_transform ()" and use "transform ()" as it is for test data.
・ Code![Screenshot 2020-10-29 12.36.21.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/041bc221-3f6e-1675- 3ab2-5ee84508ee6e.png)
・ Machine learning such as regression analysis is premised on linear separation, so linear separation is not possible. Data cannot be handled, but such data can be handled by using __Kernel Principal Component Analysis (Kernel PCA) __, which converts non-linear separation data into linearizable data. .. -In the kernel PCA, the given N (number of data) x M (feature) data is recreated into N x M'data K with a new feature M'. This is called __kernel trick __ and K is called __kernel matrix __. -This kernel matrix K enables principal component analysis.
-To perform kernel tricks, you first need to calculate the kernel matrix K. If the original data is N (number of data) x M (feature), then K is N x N. -The kernel matrix is a matrix of the __kernel function __ that calculates the "similarity of each data pair". -There are several types of this kernel function, but this time we will look at the __Gauss kernel __ of the radial basis function (kernel). -One of the kernel functions Gaussian kernel can be calculated as follows.
#Calculate the square Euclidean distance between data
M = np.sum((X - X[:,np.newaxis])**2, axis=2)
#Compute kernel matrix using M
K = np.exp(-gamma*M)
-As mentioned above, the kernel matrix K obtained by performing the kernel trick can perform principal component analysis. -By performing principal component analysis, data X that was originally not linearly separable can be converted to linearly separable data X'.
・ Code![Screenshot 2020-10-29 12.38.10.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/06e6a936-b50b-98c2- d9e0-5c5c80514f44.png)
・ Result![Screenshot 2020-10-29 12.38.42.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/6d86ceb5-21ac-a9a8- aa8d-29176b572726.png)
-By using the scikit-learn class called KernelPCA, kernel principal component analysis can be easily performed. -For the arguments, n_components is the number of dimensions after compression, kernel is the radial basis function (kernel), and gamma is the value of "γ" used to calculate the kernel matrix.
from sklearn.decomposition import KernelPCA
#Create a KernelPCA instance and perform principal component analysis
kpca = KernelPCA(n_components=2, kernel="rbf", gamma=15)
X_kpca = kpca.fit_transform(X)
-By compressing data (__ dimensionality reduction __) by __principal component analysis __, it is possible to draw on a plane and improve the accuracy of regression analysis. -Principal component analysis can be easily performed by calling __PCA class __. -By converting the data using the radial basis function (kernel), principal component analysis can be performed on the data that cannot be linearly separated. This makes __ linearly separable data linearly separable __ and machine learning possible. This is called __kernel principal component analysis __. -Kernel principal component analysis can be easily performed by calling __KernelPCA class __.
This time is over. Thank you for reading until the end.
Recommended Posts