This is a memo for myself as I read Introduction to Natural Language Processing Applications in 15 Steps. This time, in Chapter 4, Step 14, make a note of your own points.

Preparation

--Personal MacPC: MacOS Mojave version 10.14.6 --docker version: Version 19.03.2 for both Client and Server

Chapter overview

In this chapter, we aim to find the values of appropriate parameters (not the parameters to be learned) that should be given to the machine learning system from the outside.

--Grid search

Hyperopt --Probability distribution

14.1 Hyperparameters

One-step meta parameters set by designers and programmers before learning, not parameters adjusted and acquired by learning.

--Feature extractor --Identifier

NN --Layer type --Number of layers --Number of units per layer --Dropout presence / absence and coefficient --Optimizer type and various arguments --Learning rate
- etc...

14.2 Grid search

This is a method of enumerating the parameters to be searched and the candidates for each value, and trying all the combinations to find the best one. Scikit-learn provides sklearn.model_selection.GridSearchCV.

--Use a classifier class with Scikit-learn API --For Keras, use scikit-learn API wrapper --The optimal parameters can be obtained with <gridsearch instance> .best_params_.

#Do not use pipeline

## train
vectorizer = TfidfVectorizer(tokenizer=tokenize, ngram_range=(1, 2))
train_vectors = vectorizer.fit_transform(train_texts)
parameters = {  # <1>
    'n_estimators': [10, 20, 30, 40, 50, 100, 200, 300, 400, 500],
    'max_features': ('sqrt', 'log2', None),
}
classifier = RandomForestClassifier()
gridsearch = GridSearchCV(classifier, parameters)
gridsearch.fit(train_vectors, train_labels)

## predict
test_vectors = vectorizer.transform(test_texts)
predictions = gridsearch.predict(test_vectors)


#Use pipeline

## train 
pipeline = Pipeline([
    ('vectorizer', TfidfVectorizer(tokenizer=tokenize, ngram_range=(1, 2))),
    ('classifier', RandomForestClassifier()),
])
parameters = {
    'vectorizer__ngram_range':[(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)],
    'classifier__n_estimators':[10, 20, 30, 40, 50, 100, 200, 300, 400, 500],
    'classifier__max_features':('sqrt', 'log2', None),
}
gridsearch = GridSearchCV(pipeline, parameters)
gridsearch.fit(texts, labels)

## predict
gridsearch.predict(texts)

`GridSearchCV`


#verbose: Displayed because I did not know the execution status of the grid search(1)
# n_jobs: as much as possible(-1)Run in parallel
clf = GridSearchCV(pipeline, parameters, verbose=1, n_jobs=-1)

`Execution result`


from dialogue_agent import DialogueAgent  # <1>
↓
from dialogue_agent_pipeline_gridsearch import DialogueAgent  # <1>

$ docker run -it -v $(pwd):/usr/src/app/ 15step:latest python evaluate_dialogue_agent.py
#It took about 20 minutes to complete the execution
Fitting 3 folds for each of 180 candidates, totalling 540 fits

[Parallel(n_jobs=-1)]: Done  46 tasks      | elapsed:    5.4s
[Parallel(n_jobs=-1)]: Done 196 tasks      | elapsed:  2.2min
[Parallel(n_jobs=-1)]: Done 446 tasks      | elapsed:  5.5min
[Parallel(n_jobs=-1)]: Done 540 out of 540 | elapsed: 19.9min finished

0.7021276595744681
{'classifier__max_features': 'log2', 'classifier__n_estimators': 300, 'vectorizer__ngram_range': (1, 1)}

--As the number of parameters to be searched increases, the time required for the grid search search becomes exponentially longer. --Inside GridSearchCV # fit, performance evaluation is performed every time you try one set of parameters. --Cross validation (default number of divisions is 3) --Evaluation method in which the training data is divided into K pieces, one of which is used as validation data, and the rest is used as training data for learning and evaluation, which is performed K times while changing the validation data.

14.3 Using Hyperopt 14.4 Probability distribution

A tool for searching hyperparameters more efficiently than grid search. Give the parameter space and objective function and return the optimal hyperparameters.

--Parameter space: Parameters to be searched and candidates for each value --Uniform distribution: All values have the same probability of appearing --Random number uniform distribution: The logarithm of values follows a uniform distribution. Large values can be obtained sparsely and small values can be obtained densely. It is desirable to search for the learning rate at logarithmic intervals. --Generation: When specifying the lower and upper limits of Hyperopt, specify the logarithm of the value. math.log (..) --Objective function: A function that receives a set of parameter values and returns the values. --If you want to maximize the accuracy, multiply it by minus to minimize it (minimizing minus is maximizing). --The optimal parameters can be obtained from the return value of <hyperopt instance> .fmin.

#Parameter search
vectorizer = TfidfVectorizer(tokenizer=tokenize, ngram_range=(1, 2))
train_vectors = vectorizer.fit_transform(train_texts)

##Objective function
def objective(args):
    classifier = RandomForestClassifier(n_estimators=int(args['n_estimators']),
                                        max_features=args['max_features'])
    classifier.fit(tr_vectors, tr_labels)
    val_predictions = classifier.predict(val_vectors)
    accuracy = accuracy_score(val_predictions, val_labels)
    return -accuracy

##Parameter space
max_features_choices = ('sqrt', 'log2', None)
space = {
    'n_estimators': hp.quniform('n_estimators', 10, 500, 10),
    'max_features': hp.choice('max_features', max_features_choices),
}
best = fmin(objective, space, algo=tpe.suggest, max_evals=30)

# train
best_classifier = RandomForestClassifier(
    n_estimators=int(best['n_estimators']),
    max_features=max_features_choices[best['max_features']])
best_classifier.fit(train_vectors, train_labels)

# predict
test_vectors = vectorizer.transform(test_texts)
predictions = best_classifier.predict(test_vectors)

14.5 Application to Keras

Details are omitted because the execution does not finish easily.

--Session clear --If you build the model on the memory many times during the search, it will eat up the GPU memory, so release processing is inserted each time. - if Keras.backend.backend() == 'tensorflow': - ....Keras.backend.clear_session() --Parameters --If the search items differ depending on the options, you can specify detailed items for each search item by nesting the parameter space. --Example of optimizer --SGD: Search for learning rate and momentum --Adagrad; Search only learning rate

Evaluation

Execution is not progressing well (because it is a CPU, it takes time, can't it be executed with a CPU?), So I will update it later if I can afford it.

Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 4 Step 14 Memo "Hyperparameter Search"

Contents