Hyperparameter tuning is a technique used to improve the accuracy of models. If you make a model with scikit-learn and do not set parameters, it will be set with appropriate complexity.
Parameters that are specified before training to determine the training method, speed, and model complexity.
Source: http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf
It is a method to decide candidates for each parameter and try all combinations of them. Since all are tried, it is not possible to increase the number of parameter candidates.
It is a method of deciding a candidate for each parameter and repeating a random combination of parameters n times. It may not be possible to search for a better combination of parameters because we do not try all of them.
import numpy as np
params_list01 = [1, 3, 5, 7]
params_list02 = [1, 2, 3, 4, 5]
#Grid search
grid_search_params = []
for p1 in params_list01:
for p2 in params_list02:
grid_search_params.append(p1, p2)
# append():Add an element to the end of the list
#Random search
random_search_params = []
count = 10
for i in range(count):
p1 = np.random.choice(params_list01) # random.choice():Get the contents of the array randomly
p2 = np.random.choice(params_list02)
random_search_params.append(p1, p2)
scikit-learn
Click here for scikit-learn reference
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
params = {
"max_depth": [2, 4, 6, 8, None],
"n_estimators": [50,100,200,300,400,500],
"max_features": range(1, 11),
"min_samples_split": range(2, 11),
"min_samples_leaf": range(1, 11)
}
#Grid search
gscv = GridSearchCV(RandomForestRegressor(), params, cv=3, n_jobs=-1, verbose=1)
gscv.fit(X_train_valid, y_train_valid)
print("Best score: {}".format(gscv.best_score_))
print("Best parameters: {}".format(gscv.best_params_))
#Random search
rscv = RandomizedSearchCV(RandomForestRegressor(), params, cv=3, n_iter=10, n_jobs=-1, verbose=1)
rscv.fit(X_train_valid, y_train_valid)
print("Best score: {}".format(rscv.best_score_))
print("Best parameters: {}".format(rscv.best_params_))
When asking which one should be adopted, a random search is performed. It seems that a good combination of parameters can be found efficiently.
Book: Data analysis technology that wins with Kaggle (Technical Review)
Recommended Posts