max_depth is a parameter that represents the maximum depth of the tree that the model learns during training.
When the max_depth value is not set, the tree splits the data until the teacher data classification is almost complete.
This results in a less general model of learning with excessive trust in teacher data.
Also, even if the value is too large, the growth of the tree will stop at the stage when the classification is completed, so it will be the same as the above state.
Set max_depth to limit the height of the tree
It is called debranching of the decision tree.
random_state is a parameter directly related to the learning process of decision trees.
In the division of the decision tree, the value of the element that can explain the classification of the data is often found at the time of division, and the data is divided. However, since there are many candidates for such a value, random_state is used to generate a random number. , I have decided the candidate.
One of the characteristics of Random Forest is that the result is determined by a majority vote using multiple simple decision trees.
It is this n that determines the number of simple decision trees_It is a parameter called estimators.
Random forest creates multiple simple decision trees, so it is possible to set parameters related to decision trees. max_depth is a parameter to prevent overfitting of decision trees. In a random forest, enter a value smaller than a normal decision tree.
Since it is an algorithm called majority decision of classification of simple decision trees, it is better than performing strict classification for each decision tree. By narrowing down the elements of interest and performing a bird's-eye view analysis, it is possible to maintain high learning efficiency and high accuracy.
random_state is also an important parameter in Random Forest.
As the name of Random Forest suggests, not only the result is fixed, but also the analysis result differs greatly depending on this parameter in this method where random numbers contribute in many situations such as dividing the data of the decision tree and determining the elements to be used.
k-NN
n_neighbors is k-It is the value of kk of NN.
In other words, it is a parameter that determines the number of similar data used when predicting results.
If the number of n_neighbors is too large, the similarity of the data selected as similar data will vary, and categories with a narrow classification range will not be classified well.
It takes too much time and effort to change all the parameters each time and check the result.
The best result was specified by specifying the parameter range. The method is to have the calculator find the parameter set. There are two main methods
With grid search
It is a random search.
Grid search explicitly specifies multiple candidates for hyperparameter values that you want to adjust. As a model by creating a parameter set and repeating the evaluation of the model at that time This is the method used to create the optimal parameter set.
Parameter values such as strings, integers, True or False, to explicitly specify value candidates It is suitable for searching for parameters that take values that are not mathematically continuous. However, because the parameter set is created to cover the parameter candidates. Not suitable for tuning many parameters at the same time.
The code looks like this: Please note that it takes time to execute the program.
import scipy.stats
from sklearn.datasets import load_digits
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
data = load_digits()
train_X, test_X, train_y, test_y = train_test_split(data.data, data.target, random_state=42)
#Set parameter value candidates
model_param_set_grid = {SVC(): {"kernel": ["linear", "poly", "rbf", "sigmoid"],
"C": [10 ** i for i in range(-5, 5)],
"decision_function_shape": ["ovr", "ovo"],
"random_state": [42]}}
max_score = 0
best_param = None
#Parameter search with grid search
for model, param in model_param_set_grid.items():
clf = GridSearchCV(model, param)
clf.fit(train_X, train_y)
pred_y = clf.predict(test_X)
score = f1_score(test_y, pred_y, average="micro")
if max_score < score:
max_score = score
best_model = model.__class__.__name__
best_param = clf.best_params_
print("parameter:{}".format(best_param))
print("Best score:",max_score)
svm = SVC()
svm.fit(train_X, train_y)
print()
print('No adjustment')
print(svm.score(test_X, test_y))
The grid search specified candidate values and adjusted the parameters on it.
Random search has parameters
Specify the range of possible values
By iteratively evaluating the model with a probability-determined parameter set How to find the best parameter set.
Specifying the range of values is to specify the probability function of the parameter.
Scipy as a parameter probability function.The probability function of the stats module is often used.
The code is below.
import scipy.stats
from sklearn.datasets import load_digits
from sklearn.svm import SVC
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
data = load_digits()
train_X, test_X, train_y, test_y = train_test_split(data.data, data.target, random_state=42)
#Set parameter value candidates
model_param_set_random = {SVC(): {
"kernel": ["linear", "poly", "rbf", "sigmoid"],
"C": scipy.stats.uniform(0.00001, 1000),
"decision_function_shape": ["ovr", "ovo"],
"random_state": scipy.stats.randint(0, 100)
}}
max_score = 0
best_param = None
#Parameter search with random search
for model, param in model_param_set_random.items():
clf = RandomizedSearchCV(model, param)
clf.fit(train_X, train_y)
pred_y = clf.predict(test_X)
score = f1_score(test_y, pred_y, average="micro")
if max_score < score:
max_score = score
best_param = clf.best_params_
print("parameter:{}".format(best_param))
print("Best score:",max_score)
svm = SVC()
svm.fit(train_X, train_y)
print()
print('No adjustment')
print(svm.score(test_X, test_y))
Neural networks are basically a method called the gradient method (the steepest descent method). Gradually proceed in the direction of reducing the loss function. Neural networks usually have many saddle points (pseudo-solutions) If you get hooked on the saddle point, the gradient becomes 0 and you cannot move, so you cannot reach the original solution.
Therefore, the gradient method has produced various improved methods. There is no one-size-fits-all optimization method for the loss function. (No free lunch theorem) Also, the loss function changes depending on the task and data, so optimization should be tried before theory.
Recommended Posts