We have summarized the classification of machine learning and a simple implementation using a library of those algorithms. The code of each algorithm includes sample data, so it can be executed as it is. Since only the minimum required parameters are set, please refer to the official document etc. for detailed settings.
Each algorithm has a brief explanation, but no detailed explanation.
--I want to know the classification of machine learning algorithms --I want to implement and operate a machine learning algorithm
--Understanding the classification of machine learning algorithms --Can implement machine learning algorithms
Machine learning is categorized as follows.
--Supervised learning --Regression --Classification --Unsupervised learning --Reinforcement learning
This time, we will not deal with the implementation of reinforcement learning.
Supervised learning is a method of learning the answer to a problem from data that represent features (features, explanatory variables) and data that is the answer (labels, objective variables).
Supervised learning can be divided into the following two categories.
--Regression: Predict continuous numbers --Height prediction, etc. --Category: Predict unordered labels --Gender prediction, etc.
I will briefly explain the data used in the implementation.
Here, we will use the sample data of scikit-learn. The data used are as follows for regression and classification.
--Regression -Boston Home Prices ――13 features --The objective variable is the house price --Classification -Wine Quality ――13 features --Classes to classify are 3
We will introduce the algorithm of supervised learning. It describes whether each algorithm can be applied to regression or classification.
--Regression
A method of modeling the relationship in which the objective variable becomes larger (smaller) as the feature amount becomes larger.
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
#Data reading
boston = load_boston()
X = boston['data']
y = boston['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
model = LinearRegression()
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
print('score is', score)
--Classification
A method of learning the probability that an event will occur.
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
#Data reading
wine = load_wine()
X = wine['data']
y = wine['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
model = LogisticRegression()
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
print('score is', score)
--Regression --Classification
A method of predicting by majority vote from multiple decision trees.
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
#Data reading
boston = load_boston()
X = boston['data']
y = boston['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
model = RandomForestRegressor(random_state=0)
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
print('score is', score)
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
#Data reading
wine = load_wine()
X = wine['data']
y = wine['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
model = RandomForestClassifier(random_state=0)
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
print('score is', score)
A technique for obtaining better decision boundaries by maximizing the margin.
--Regression --Classification
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
#Data reading
boston = load_boston()
X = boston['data']
y = boston['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
model = SVR(kernel='linear', gamma='auto')
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
print('score is', score)
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
#Data reading
wine = load_wine()
X = wine['data']
y = wine['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
model = SVC(gamma='auto')
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
print('score is', score)
kNN
A method of memorizing all the training data and making a majority vote from k data that are close to the data you want to predict.
--Regression --Classification
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
#Data reading
boston = load_boston()
X = boston['data']
y = boston['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
model = KNeighborsRegressor(n_neighbors=3)
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
print('score is', score)
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
#Data reading
wine = load_wine()
X = wine['data']
y = wine['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
print('score is', score)
A method that imitates the neural circuits of the human brain, which has a structure consisting of an input layer, a hidden layer, and an output layer.
--Regression --Classification
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from tensorflow.keras import models
from tensorflow.keras import layers
#Data reading
boston = load_boston()
X = boston['data']
y = boston['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(X_train.shape[1],)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(1))
model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
model.fit(X_train, y_train)
mse, mae = model.evaluate(X_test, y_test)
print('MSE is', mse)
print('MAE is', mae)
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from tensorflow.keras import models
from tensorflow.keras import layers
from tensorflow.keras import utils
#Data reading
boston = load_wine()
X = boston['data']
y = utils.to_categorical(boston['target'])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(X_train.shape[1],)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(3, activation='softmax'))
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train)
crossentropy, acc = model.evaluate(X_test, y_test)
print('Categorical Crossentropy is', crossentropy)
print('Accuracy is', acc)
One of ensemble learning that trains multiple models. A method of repeatedly extracting some data and sequentially learning multiple decision tree models.
--Regression --Classification
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import lightgbm as lgb
import numpy as np
#Data reading
wine = load_boston()
X = wine['data']
y = wine['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test)
params = {
'objective': 'regression',
'metric': 'mse',
}
num_round = 100
model = lgb.train(
params,
lgb_train,
valid_sets=lgb_eval,
num_boost_round=num_round,
)
y_pred = model.predict(X_test)
score = mean_squared_error(y_test, y_pred)
print('score is', score)
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import lightgbm as lgb
import numpy as np
#Data reading
wine = load_wine()
X = wine['data']
y = wine['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test)
params = {
'objective': 'multiclass',
'num_class': 3,
}
num_round = 100
model = lgb.train(
params,
lgb_train,
valid_sets=lgb_eval,
num_boost_round=num_round,
)
pred = model.predict(X_test)
y_pred = []
for p in pred:
y_pred.append(np.argmax(p))
score = accuracy_score(y_test, y_pred)
print('score is', score)
Unsupervised learning is a method of learning from only data that represents characteristics, without any data that can be answered.
In the use of unsupervised learning, it is the quality data of the wine used in supervised learning.
-Wine Quality ――13 features --Classes to classify are 3
We will introduce the algorithm of unsupervised learning.
K-means
One of the clustering methods. A method of organizing data into k classes.
from sklearn.datasets import load_wine
from sklearn.cluster import KMeans
#Data reading
wine = load_wine()
X = wine['data']
model = KMeans(n_clusters=3, random_state=0)
model.fit(X)
print("labels: \n", model.labels_)
print("cluster centers: \n", model.cluster_centers_)
print("predict result: \n", model.predict(X))
One of the clustering methods. A method of classifying data according to which Gaussian distribution it belongs to, assuming that the data is generated from multiple Gaussian distributions.
from sklearn.datasets import load_wine
from sklearn.mixture import GaussianMixture
#Data reading
wine = load_wine()
X = wine['data']
model = GaussianMixture(n_components=4)
model.fit(X)
print("means: \n", model.means_)
print("predict result: \n", model.predict(X))
One of the dimension reduction methods. A method of expressing data from a large number of variables with fewer variables (main components) while preserving the characteristics of the data.
from sklearn.datasets import load_wine
from sklearn.decomposition import PCA
#Data reading
wine = load_wine()
X = wine['data']
model = PCA(n_components=4)
model.fit(X)
print('Before Transform:', X.shape[1])
print('After Transform:', model.transform(X).shape[1])
--Supervised learning can be divided into regression and classification --There are several types of unsupervised learning ――You can do machine learning with a little code if you try it --Since there are several parameters, refer to the official documentation as needed.
I haven't studied enough about unsupervised learning and reinforcement learning, so I will continue to study.
-[Mechanism of machine learning algorithm that can be understood by looking at it](https://www.amazon.co.jp/%E8%A6%8B%E3%81%A6%E8%A9%A6%E3% 81% 97% E3% 81% A6% E3% 82% 8F% E3% 81% 8B% E3% 82% 8B% E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92% E3% 82% A2% E3% 83% AB% E3% 82% B4% E3% 83% AA% E3% 82% BA% E3% 83% A0% E3% 81% AE% E4% BB% 95% E7% B5% 84% E3% 81% BF-% E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92% E5% 9B% B3% E9% 91% 91-% E7% A7% 8B% E5% BA% AD-% E4% BC% B8% E4% B9% 9F / dp / 4798155659) -[Learning by running with Python! A new machine learning textbook](https://www.amazon.co.jp/Python%E3%81%A7%E5%8B%95%E3%81%8B%E3%81%97%E3%81%A6 % E5% AD% A6% E3% 81% B6% EF% BC% 81% E3% 81% 82% E3% 81% 9F% E3% 82% 89% E3% 81% 97% E3% 81% 84% E6 % A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92% E3% 81% AE% E6% 95% 99% E7% A7% 91% E6% 9B% B8-% E7% AC% AC2% E7% 89% 88-AI-TECHNOLOGY-% E4% BC% 8A% E8% 97% A4 / dp / 4798159913 / ref = pd_aw_sbs_14_49? _Encoding = UTF8 & pd_rd_i = 4798159113 & pd_rd_r = a02c3fb4-ab39 = HfV8H & pd_rd_wg = ub3Ib & pf_rd_p = 1893a417-ba87-4709-ab4f-0dece788c310 & pf_rd_r = 874Z4AYT8W8GD6QQK6TT & psc = 1 & refRID = 874Z4AYT8W8GD6QQK6TT) -[Deep Learning with Python and Keras](https://www.amazon.co.jp/Python%E3%81%A8Keras%E3%81%AB%E3%82%88%E3%82%8B%E3% 83% 87% E3% 82% A3% E3% 83% BC% E3% 83% 97% E3% 83% A9% E3% 83% BC% E3% 83% 8B% E3% 83% B3% E3% 82% B0-Francois-Chollet / dp / 4839964262) -[Deep Learning Textbook Deep Learning G Test (Generalist) Official Text](https://www.amazon.co.jp/%E6%B7%B1%E5%B1%A4%E5%AD%A6%E7%BF % 92% E6% 95% 99% E7% A7% 91% E6% 9B% B8-% E3% 83% 87% E3% 82% A3% E3% 83% BC% E3% 83% 97% E3% 83% A9% E3% 83% BC% E3% 83% 8B% E3% 83% B3% E3% 82% B0-G% E6% A4% 9C% E5% AE% 9A-% E3% 82% B8% E3% 82 % A7% E3% 83% 8D% E3% 83% A9% E3% 83% AA% E3% 82% B9% E3% 83% 88-% E5% 85% AC% E5% BC% 8F% E3% 83% 86% E3% 82% AD% E3% 82% B9% E3% 83% 88 / dp / 4798157554) -List of typical machine learning methods
Recommended Posts