Supervised learning with kernel density estimation

This article is written by beginners in machine learning. Please note.

The previous article is here. The next article is here.

Relationship between kernel density estimation and supervised learning

I associated it without permission. For more information (not very detailed), please refer to the previous article (https://qiita.com/sorax/items/8663906fae41798a00b8). The simple summary is "I tried using kernel density estimation as a classifier for supervised learning!".

Object-orientation

I modified the script summarized in Previous article to make it object-oriented. The name is "Gaussian kernel-density estimate classifier", or "GKDE Classifier" for short. I just named it arbitrarily.

↓ Script ↓

import numpy as np

class GKDEClassifier(object):
    
    def __init__(self, bw_method="scotts_factor", weights="None"):
 # Kernel bandwidth
        self.bw_method = bw_method
 # Kernel weight
        self.weights = weights
        
    def fit(self, X, y):
 Number of labels for # y
        self.y_num = len(np.unique(y))
 # List containing estimated probability density functions
        self.kernel_ = []
 # Store probability density function
        for i in range(self.y_num):
            kernel = gaussian_kde(X[y==i].T)
            self.kernel_.append(kernel)
        return self

    def predict(self, X):
 # List to store predictive labels
        pred = []
 #List of test data label-specific probabilities
        self.p_ = []
 # Store probabilities by label
        for i in range(self.y_num):
            self.p_.append(self.kernel_[i].evaluate(X.T).tolist())
 # ndarray
        self.p_ = np.array(self.p_)
 # Prediction label allocation
        for j in range(self.p_.shape[1]):
            pred.append(np.argmax(self.p_.T[j]))
        return pred

Labels should be assigned in the order 0, 1, 2 ... (in ascending order of non-negative integers). Maybe: LabelEncoder

(Added on 2020/8/5: Part 3 has released the modified code)

\ _ \ _ Init \ _ \ _ method

Initializes the object. Here, specify the parameters required for kernel density estimation, that is, the arguments required for initializing SciPy's gaussian_kde. This time, I set the same value as the default value of gaussian_kde.

fit method

Learning is performed using teacher data. After estimating the kernel density with gaussian_kde, the estimated density function of label 0, the estimated density function of label 1, and so on are stored in order.

predict method

Predict test data.

for i in range(self.y_num):
            self.p_.append(self.kernel_[i].evaluate(X.T).tolist())

Here, the estimated density functions are extracted one by one from kernel_ and the probability density of the test data is calculated.

Subsequent scripts are messed up. I wanted to write it more concisely, but it didn't behave as I expected ... There is a coding beginner. Just move. It's better just to move.

That's why the object-oriented Gaussian kernel density estimation classifier is complete.

wine dataset

Combination with PCA

The wine dataset has 13 features, but after standardization, it will be reduced to 4 dimensions. Let's learn and classify with the data after dimensionality reduction.

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

# Data set loading
wine = datasets.load_wine()
X = wine.data
y = wine.target

# Data split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=1, stratify=y)

# Standardization
sc = StandardScaler()
sc = sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)

# Dimensionality reduction
pca = PCA(n_components=4)
X_train_pca = pca.fit_transform(X_train_std)
X_test_pca = pca.transform(X_test_std)

# Learning and prediction
f = GKDEClassifier()
f.fit(X_train_pca, y_train)
y_pred = f.predict(X_test_pca)

Result is……?

from sklearn.metrics import accuracy_score

print(accuracy_score(y_test, y_pred))
 0.9722222222222222

Hooray. Since there are 36 test data, the correct answer rate is 35/36. It's pretty good.

No dimensionality reduction

What will happen?

# Learning and prediction
f = GKDEClassifier()
f.fit(X_train_std, y_train)
y_pred = f.predict(X_test_std)

print(accuracy_score(y_test, y_pred))
 0.9722222222222222

Result: Same.

Circular data set

I made a circular dataset.

from sklearn.datasets import make_circles
from matplotlib import pyplot as plt

X, y = make_circles(n_samples=1000, random_state=1, noise=0.1, factor=0.2)
plt.scatter(X[y==0, 0], X[y==0, 1], c="red", marker="^", alpha=0.5)
plt.scatter(X[y==1, 0], X[y==1, 1], c="blue", marker="o", alpha=0.5)
plt.show()

The label is different at the center and the outer edge. Can you classify it correctly?

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=1, stratify=y)

f = GKDEClassifier()
f.fit(X_train, y_train)
y_pred = f.predict(X_test)

print(accuracy_score(y_test, y_pred))
 0.9933333333333333

Conclusion: Great victory.

Finally

I've categorized it so well, but I've forgotten what's important. That is the academic correctness of this classification method. Next time I will discuss it.

Continued to Part 3

[Machine learning] Supervised learning using kernel density estimation Part 2