Hyperparameter tuning

Introduction

I tried to create a model using CNN for handwriting recognition, but I didn't know how to set hyperparameters to create a highly accurate model, so I searched for the optimum parameters using grid search. .. Since I am a beginner in machine learning, I would like to ask experts about various ways to improve accuracy.

Premise

Since GPU is easy to use for free, we will implement it using Google Colaboratory. Since grid search checks the accuracy of specified hyperparameters by brute force, it takes a lot of calculation and takes time, so it is recommended to use GPU. Random search may save time and benefit, but this time we will use grid search.

This time, we will create a CNN model with 3 hidden layers. The number of fully connected layers is appropriately set to 256,128,64.

The effect of layer depth and the number of fully connected layers on accuracy should also be investigated, but this time it will be ignored.

Postscript I verified the value of CNN filter in the following article. Hyperparameter tuning 2

Grid search

Library import

from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import Activation, Conv2D, Dense, Flatten, MaxPooling2D
from keras.utils.np_utils import to_categorical
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import GridSearchCV
import numpy as np

Data preparation

(X_train, y_train), (X_test, y_test) = mnist.load_data()

#Pixel value 0~Normalize between 1
X_train = X_train / 255.0
X_test = X_test / 255.0

X_train = X_train.reshape(-1, 28, 28, 1)
X_test = X_test.reshape(-1, 28, 28, 1)
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

Confirmation of shape

Let's check for the time being.

print(np.shape(X_train)) # (60000, 28, 28, 1)
print(np.shape(y_train)) # (60000, 10)
print(np.shape(X_test)) # (10000, 28, 28, 1)
print(np.shape(y_test)) # (10000, 10)

Creating a model

def build_model(activation , optimizer):
    model = Sequential()

    model.add(Conv2D(input_shape=(28, 28, 1), filters=32, kernel_size=(3, 3), strides=(1, 1), padding='same'))
    model.add(MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding='same'))
    model.add(Conv2D(filters=32, kernel_size=(3, 3), strides=(1, 1), padding='same'))
    model.add(MaxPooling2D(pool_size=(2, 2), strides=(1,1)))

    model.add(Flatten())
    model.add(Dense(256, activation=activation, input_dim=784))
    model.add(Dense(128, activation=activation))
    model.add(Dense(64, activation=activation))
    model.add(Dense(10, activation='softmax'))

    model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])

    return model

Run

It takes a lot of time. As for hyperparameters, let's list what you want to check with list type.

#Prepare hyperparameters
activation = ['relu', 'sigmoid']
optimizer = ['adam', 'sgd']
nb_epoch = [10, 20]
batch_size = [64, 128, 256]

#Collect hyperparameters for grid search in dictionary type
param_grid = dict(activation=activation, optimizer=optimizer,nb_epoch=nb_epoch, batch_size=batch_size)

#Create a model
model = KerasRegressor(build_fn = build_model, verbose=False)

#Perform grid search
grid = GridSearchCV(estimator=model, param_grid=param_grid)
grid_result = grid.fit(X_train, y_train)

Check the result

print(grid_result.best_params_)
# {'activation': 'relu', 'batch_size': 64, 'nb_epoch': 20, 'optimizer': 'adam'}

Now you can see the optimal hyperparameters. Finally, let's create a model using this hyperparameter and check the accuracy.

Check the generalization performance by setting the obtained hyperparameters

Set the hyperparameters you obtained earlier, create a handwriting recognition model, and check the accuracy.

#Creating a model
model = build_model(activation='relu', optimizer='adam')

#Learning
history = model.fit(X_train, y_train, epochs=20, batch_size=64)

#Confirmation of accuracy
model.evaluate(X_test, y_test) # [0.06611067801713943, 0.9872000217437744]

From the above, we were able to create a model with an accuracy of approximately 98.7%.

Random search

I will put only the code. Specify how many times to randomly verify in the n_iter part. Random search saves time, but since we are not brute force, there is a possibility that better hyperparameters exist, so it is important to set the value of n_iter well. It will be.

from sklearn.model_selection import RandomizedSearchCV
#Prepare hyperparameters
activation = ['relu', 'sigmoid']
optimizer = ['adam', 'sgd']
nb_epoch = [10, 20]
batch_size = [64, 128, 256]

#Collect hyperparameters for grid search in dictionary type
param_grid = dict(activation=activation, optimizer=optimizer,nb_epoch=nb_epoch, batch_size=batch_size)

#Create a model
model = KerasRegressor(build_fn = build_model, verbose=False)

#Random search
rand = RandomizedSearchCV(estimator=model, param_distributions=param_dict, n_iter=16)
rand_result = rand.fit(X_train, y_train)

in conclusion

Until the end Thank you for reading. I hope it helps someone's learning.

Recommended Posts

Hyperparameter tuning
Hyperparameter tuning 2
What is hyperparameter tuning?
Random forest (classification) and hyperparameter tuning