Hello, this is Motty. This time, classification (clustering) was done in Python.
Classification in statistics and machine learning refers to classifying data into groups of similar features. It is one of "unsupervised learning" because it is done without a standard in advance.
The K-means method is an algorithm that classifies into a given number of clusters (k) using the average of clusters. The classification structure is optimized by classifying each data according to how close it is to the center of gravity and updating the center of gravity sequentially.
KMeans.py
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs as mb
clf = KMeans(n_clusters = 3)
N = 100 #Number of sample
dataset = mb(centers = 3)
features = np.array(dataset[0])
pred = clf.fit_predict(features)
I was able to classify it neatly.
It should be noted that the data itself is clean, the number of K is appropriate, and the algorithm selection is appropriate. If the conditions are not met, it may not be possible to divide the data neatly in this way.
NOISE = [25,25]
features = np.append(features,NOISE).reshape(-1,2)
dataset = mb(centers = 4)
makemoons.py
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs as mb
clf = KMeans(n_clusters = 2)
X1,y1 = make_moons(noise = 0.05, random_state=0)
pred1 = clf.fit_predict(X1)
for i in range(2):
labels = X1[pred1 == i]
plt.scatter(labels[:,0],labels[:,1])
plt.show()
There are various classification algorithms, and this time I described one of them, the KMeans method. I would like to describe the classification of SVM and run ram forest later.
Recommended Posts