Clustering analysis (k-means method)

・ Enter the data frame in df and the number of clusters in num. -Specify a random seed integer in random_state

def clustering_analytics(df, num):
    df_temp = df.copy()
    from sklearn.preprocessing import StandardScaler
    from sklearn.cluster import KMeans
    sc = StandardScaler()
    #Standardization
    df_std = sc.fit_transform(df_temp)
    
    kmeans = KMeans(n_clusters=num, random_state=0)
    clusters = kmeans.fit(df_std)
    df_temp["cluster"] = clusters.labels_
    return df_temp

Principal component analysis (PCA)

・ Enter the data frame in df and the number of principal components in num.

def PCA_analytics(df, num):
    from sklearn.preprocessing import StandardScaler
    from sklearn.decomposition import PCA
    import numpy as np
    sc = StandardScaler()
    df_temp = df.copy()
    #Standardization
    df_std = sc.fit_transform(df_temp)
    pca = PCA(n_components = num)
    pca.fit(df_std)
    df_temp__pca = pca.transform(df_std)
    pca_df = pd.DataFrame(df_temp__pca)
    
    print('components, main components')
    print(pca.components_)
    print('mean, mean')
    print(pca.mean_)
    print('covariance, covariance matrix')
    print(pca.get_covariance())
    W, v = np.linalg.eig(pca.get_covariance())
    print('eigenvector, eigenvector')
    print(v)
    print('eigenvalue, eigenvalue')
    print(W)
    return pca_df

Recommended Posts

scikit-learn How to use summary (machine learning)

Summary of how to use pandas.DataFrame.loc

Summary of how to use pyenv-virtualenv

Summary of how to use csvkit

How to collect machine learning data

How to use machine learning for work? 03_Python coding procedure

[Python] Summary of how to use pandas

Introduction to Machine Learning: How Models Work

[Python2.7] Summary of how to use unittest

How to enjoy Coursera / Machine Learning (Week 10)

Summary of how to use Python list