This article is written by beginners in machine learning. Please note.
An example actually used is here. The specific background of the idea and the revised content are here.
Bukkake [WIkipedia](https://ja.wikipedia.org/wiki/%E3%82%AB%E3%83%BC%E3%83%8D%E3%83%AB%E5%AF%86%E5% It is faster to look at BA% A6% E6% 8E% A8% E5% AE% 9A).
Imagine a simple histogram. It can be said that the part where the histogram is high is *** relatively easy to happen ***, and the part where the histogram is low *** is relatively unlikely to occur ***. Have you ever heard a similar story somewhere?
This is the same idea as the probability density function. Histogram is, in a sense, *** a true probability density function *** estimated by *** measured values ***. *** Kernel density estimation *** is a more continuous and smoother estimation method using kernel functions.
[Wikipedia](https://ja.wikipedia.org/wiki/%E6%95%99%E5%B8%AB%E3%81%82%E3%82%8A%E5%AD%A6%E7%BF % 92) See the teacher or read another person's Qiita.
A "teacher" in supervised learning is a set of "data" and "correct labels".
Consider a dataset with the correct label "0, 1, 2". This is divided into label 0 data, label 1 data, and label 2 data. If you estimate the kernel density using teacher data with a correct label of 0, you can find the probability density function for the event that the label becomes 0.
Find the probability density function for all labels based on the teacher data and calculate the probability density of the test data. Then, let's classify by the size of the value. That is this attempt.
Strictly speaking, we really have to calculate the percentage of each label in the population ... I would like to summarize the difficult story again.
This world is wonderful. This is because kernel density estimation using the Gaussian kernel has already been implemented in SciPy.
Here is a brief summary of how to use SciPy's gaussian_kde.
kernel = gaussian_kde(X, bw_method="scotts_factor", weights="None")
--X: Data set for kernel density estimation. --bw_method: Kernel bandwidth. Scotts_factor if not specified. --weights: Weights for kernel density estimation. If not specified, all weights are equal.
Enter new data into the estimated probability density function to calculate the probability.
pd = kernel.evaluate(Z)
--Z: Data point (s) for which you want to calculate the probability.
It is returned as a list array containing the probabilities of Z.
Try it with Scikit-learn's iris dataset!
The flow is like this iris dataset read → Split training data and test data with train_test_split → Standardization of training data and test data → Perform kernel density estimation for each label using training data → Calculate the probability density for each label of test data → Output the label with the largest value
↓ Script ↓
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from scipy.stats import gaussian_kde
# Loading iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target
# Division of training data and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=1, stratify=y)
# Standardization
sc = StandardScaler()
sc = sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)
# Kernel density estimation
kernel0 = gaussian_kde(X_train_std[y_train==0].T)
kernel1 = gaussian_kde(X_train_std[y_train==1].T)
kernel2 = gaussian_kde(X_train_std[y_train==2].T)
# Calculate the probability density of test data
p0s = kernel0.evaluate(X_test_std.T)
p1s = kernel1.evaluate(X_test_std.T)
p2s = kernel2.evaluate(X_test_std.T)
# Prediction label output
y_pred = []
for p0, p1, p2 in zip(p0s, p1s, p2s):
if max(p0, p1, p2) == p0:
y_pred.append(0)
elif max(p0, p1, p2) == p1:
y_pred.append(1)
else:
y_pred.append(2)
Test data is standardized using the mean and standard deviation of the training data. This is because if standardization is performed separately, the data may be biased or misaligned.
If you let gaussian_kde read the dataset as it is, it seems that the *** column vector is processed as one data ***. But the iris dataset transposes the data because *** row vector is one data ***. The same is true when calculating the probability density of test data.
y_pred = []
for p0, p1, p2 in zip(p0s, p1s, p2s):
if max(p0, p1, p2) == p0:
y_pred.append(0)
elif max(p0, p1, p2) == p1:
y_pred.append(1)
else:
y_pred.append(2)
The probability density of the test data is stored in p0s, p1s, p2s for each label. Take out one each
--0 if the value of label 0 is the maximum --1 if the value of label 1 is the maximum --Otherwise 2
Store the results in the list y_pred in the order of test data.
Let's check the accuracy rate of the prediction label with the accuracy_score of scikit-learn. Throb.
from sklearn.metrics import accuracy_score
print(accuracy_score(y_test, y_pred))
1.0
Hooray.
I tried to use the result of kernel density estimation as a classifier for supervised learning. In reality, such techniques are rarely used. I think that there is a drawback that the amount of calculation is large and the accuracy is significantly reduced depending on the data. However, as you can see from this trial, it seems that some data can be classified relatively quickly and nicely.
Continue to Part 2