Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 4 Step 14 Memo "Hyperparameter Search"

Contents

This is a memo for myself as I read Introduction to Natural Language Processing Applications in 15 Steps. This time, in Chapter 4, Step 14, make a note of your own points.

Preparation

--Personal MacPC: MacOS Mojave version 10.14.6 --docker version: Version 19.03.2 for both Client and Server

Chapter overview

In this chapter, we aim to find the values of appropriate parameters (not the parameters to be learned) that should be given to the machine learning system from the outside.

--Grid search

14.1 Hyperparameters

One-step meta parameters set by designers and programmers before learning, not parameters adjusted and acquired by learning.

--Feature extractor --Identifier

14.2 Grid search

This is a method of enumerating the parameters to be searched and the candidates for each value, and trying all the combinations to find the best one. Scikit-learn provides sklearn.model_selection.GridSearchCV.

--Use a classifier class with Scikit-learn API --For Keras, use scikit-learn API wrapper --The optimal parameters can be obtained with <gridsearch instance> .best_params_.

#Do not use pipeline

## train
vectorizer = TfidfVectorizer(tokenizer=tokenize, ngram_range=(1, 2))
train_vectors = vectorizer.fit_transform(train_texts)
parameters = {  # <1>
    'n_estimators': [10, 20, 30, 40, 50, 100, 200, 300, 400, 500],
    'max_features': ('sqrt', 'log2', None),
}
classifier = RandomForestClassifier()
gridsearch = GridSearchCV(classifier, parameters)
gridsearch.fit(train_vectors, train_labels)

## predict
test_vectors = vectorizer.transform(test_texts)
predictions = gridsearch.predict(test_vectors)


#Use pipeline

## train 
pipeline = Pipeline([
    ('vectorizer', TfidfVectorizer(tokenizer=tokenize, ngram_range=(1, 2))),
    ('classifier', RandomForestClassifier()),
])
parameters = {
    'vectorizer__ngram_range':[(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)],
    'classifier__n_estimators':[10, 20, 30, 40, 50, 100, 200, 300, 400, 500],
    'classifier__max_features':('sqrt', 'log2', None),
}
gridsearch = GridSearchCV(pipeline, parameters)
gridsearch.fit(texts, labels)

## predict
gridsearch.predict(texts)

GridSearchCV


#verbose: Displayed because I did not know the execution status of the grid search(1)
# n_jobs: as much as possible(-1)Run in parallel
clf = GridSearchCV(pipeline, parameters, verbose=1, n_jobs=-1)

Execution result


from dialogue_agent import DialogueAgent  # <1>
↓
from dialogue_agent_pipeline_gridsearch import DialogueAgent  # <1>

$ docker run -it -v $(pwd):/usr/src/app/ 15step:latest python evaluate_dialogue_agent.py
#It took about 20 minutes to complete the execution
Fitting 3 folds for each of 180 candidates, totalling 540 fits

[Parallel(n_jobs=-1)]: Done  46 tasks      | elapsed:    5.4s
[Parallel(n_jobs=-1)]: Done 196 tasks      | elapsed:  2.2min
[Parallel(n_jobs=-1)]: Done 446 tasks      | elapsed:  5.5min
[Parallel(n_jobs=-1)]: Done 540 out of 540 | elapsed: 19.9min finished

0.7021276595744681
{'classifier__max_features': 'log2', 'classifier__n_estimators': 300, 'vectorizer__ngram_range': (1, 1)}

--As the number of parameters to be searched increases, the time required for the grid search search becomes exponentially longer. --Inside GridSearchCV # fit, performance evaluation is performed every time you try one set of parameters. --Cross validation (default number of divisions is 3) --Evaluation method in which the training data is divided into K pieces, one of which is used as validation data, and the rest is used as training data for learning and evaluation, which is performed K times while changing the validation data.

14.3 Using Hyperopt 14.4 Probability distribution

A tool for searching hyperparameters more efficiently than grid search. Give the parameter space and objective function and return the optimal hyperparameters.

--Parameter space: Parameters to be searched and candidates for each value --Uniform distribution: All values have the same probability of appearing --Random number uniform distribution: The logarithm of values follows a uniform distribution. Large values can be obtained sparsely and small values can be obtained densely. It is desirable to search for the learning rate at logarithmic intervals. --Generation: When specifying the lower and upper limits of Hyperopt, specify the logarithm of the value. math.log (..) --Objective function: A function that receives a set of parameter values and returns the values. --If you want to maximize the accuracy, multiply it by minus to minimize it (minimizing minus is maximizing). --The optimal parameters can be obtained from the return value of <hyperopt instance> .fmin.

#Parameter search
vectorizer = TfidfVectorizer(tokenizer=tokenize, ngram_range=(1, 2))
train_vectors = vectorizer.fit_transform(train_texts)

##Objective function
def objective(args):
    classifier = RandomForestClassifier(n_estimators=int(args['n_estimators']),
                                        max_features=args['max_features'])
    classifier.fit(tr_vectors, tr_labels)
    val_predictions = classifier.predict(val_vectors)
    accuracy = accuracy_score(val_predictions, val_labels)
    return -accuracy

##Parameter space
max_features_choices = ('sqrt', 'log2', None)
space = {
    'n_estimators': hp.quniform('n_estimators', 10, 500, 10),
    'max_features': hp.choice('max_features', max_features_choices),
}
best = fmin(objective, space, algo=tpe.suggest, max_evals=30)

# train
best_classifier = RandomForestClassifier(
    n_estimators=int(best['n_estimators']),
    max_features=max_features_choices[best['max_features']])
best_classifier.fit(train_vectors, train_labels)

# predict
test_vectors = vectorizer.transform(test_texts)
predictions = best_classifier.predict(test_vectors)

14.5 Application to Keras

Details are omitted because the execution does not finish easily.

--Session clear --If you build the model on the memory many times during the search, it will eat up the GPU memory, so release processing is inserted each time. - if Keras.backend.backend() == 'tensorflow': - ....Keras.backend.clear_session() --Parameters --If the search items differ depending on the options, you can specify detailed items for each search item by nesting the parameter space. --Example of optimizer --SGD: Search for learning rate and momentum --Adagrad; Search only learning rate

Evaluation

Execution is not progressing well (because it is a CPU, it takes time, can't it be executed with a CPU?), So I will update it later if I can afford it.

Recommended Posts

Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 4 Step 14 Memo "Hyperparameter Search"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 06 Memo "Identifier"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 02 Memo "Pre-processing"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 07 Memo "Evaluation"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 04 Memo "Feature Extraction"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 4 Step 15 Memo "Data Collection"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 08 Memo "Introduction to Neural Networks"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 05 Memo "Features Conversion"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 11 Memo "Word Embeddings"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 12 Memo "Convolutional Neural Networks"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 13 Memo "Recurrent Neural Networks"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 01 Memo "Let's Make a Dialogue Agent"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 03 Memo "Morphological Analysis and Word Separation"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 10 Memo "Details and Improvements of Neural Networks"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 1 Memo "Preliminary Knowledge Before Beginning Exercises"
[WIP] Pre-processing memo in natural language processing
Summary from the beginning to Chapter 1 of the introduction to design patterns learned in the Java language
[Chapter 5] Introduction to Python with 100 knocks of language processing
[Chapter 6] Introduction to scikit-learn with 100 knocks of language processing
[Chapter 3] Introduction to Python with 100 knocks of language processing
[Chapter 2] Introduction to Python with 100 knocks of language processing
[Chapter 4] Introduction to Python with 100 knocks of language processing
[Job change meeting] Try to classify companies by processing word-of-mouth in natural language with word2vec
[Natural language processing] I tried to visualize the hot topics this week in the Slack community
[Natural language processing] I tried to visualize the remarks of each member in the Slack community
[Python] Try to classify ramen shops by natural language processing
Summary of Chapter 2 of Introduction to Design Patterns Learned in Java Language
Chapter 4 Summary of Introduction to Design Patterns Learned in Java Language
Summary of Chapter 3 of Introduction to Design Patterns Learned in Java Language
[Introduction to RasPi4] Environment construction; natural language processing system mecab, etc. .. .. ♪
Dockerfile with the necessary libraries for natural language processing in python
I tried to display the analysis result of the natural language processing library GiNZA in an easy-to-understand manner
100 natural language processing knocks Chapter 4 Commentary
100 Language Processing Knock Chapter 1 in Python
Web application development memo in python
Cython to try in the shortest
Preparing to start natural language processing
From the introduction of GoogleCloudPlatform Natural Language API to how to use it
I tried to solve the 2020 version of 100 language processing [Chapter 3: Regular expressions 25-29]