Tech-Circle Let's start application development using machine learning (self-study)

This article is a brush-up of the hands-on study session Tech-Circle Let's Start Application Development Using Machine Learning ... This is a customized version of the contents of the PyCon 2015 Tutorial announced in the above.

goal

Advance preparation

The following article summarizes the procedure for building an environment, so please set it up according to your environment.

Building a machine learning application development environment with Python

Commentary on machine learning

First, I will explain the outline of machine learning.

Machine Learning Bootstrap

Hands-on start from P37. The slides are the explanations, and this article is the procedure for moving your hands, so please read the explanations on the slides-> try the procedure.

Hands-on procedure

** Hands-on is based on Python 3 **

# 0 Preparation

# 0-1 Download source code

** Fork ** the following GitHub repository and download from there icoxfog417/number_recognizer

# 0-2 Application operation check

Enable virtual environment

(The following assumes that you have prepared as prepared in advance (create virtual environment ml_env with conda). If you have changed it, please read as appropriate).

Windows

activate ml_env

Mac/Linux

#To prevent batting with pyenv, activate checks the path of the virtual environment and executes it directly
conda info -e
# conda environments:
#
ml_env                   /usr/local/pyenv/versions/miniconda3-3.4.2/envs/ml_env
root                  *  /usr/local/pyenv/versions/miniconda3-3.4.2

source /usr/local/pyenv/versions/miniconda3-3.4.2/envs/ml_env/bin/activate ml_env

Operation check

Application (started at localhost: 3000)

Execute run_application.py directly under the number_recognition folder.

python run_application.py

IPython notebook for building machine learning models (started on localhost: 8888)

Directly under number_recognition / machines / number_recognizer, execute the following.

ipython notebook

The application feels really bad at first. I will make this smarter.

# 1 Experience the process of creating a machine learning model

Open the iPython notebook. Here you will find each step of machine learning in order. Since the code in the sentence can actually be executed with iPython notebook, let's explain and execute it in order from the top (see here for detailed usage. Please give me).

image

If you go to the last save, the model (number_recognition / machines / number_recognizer / machine.pkl) should actually be updated.

# 2 Divide the training data


Handson # 2 Explanation

Here, we will carry out the following two things.

Use cross_validation.train_test_split to split the training data. Use this and put the following processing before Training the Model.

def split_dataset(dataset, test_size=0.3):
    from sklearn import cross_validation
    from collections import namedtuple

    DataSet = namedtuple("DataSet", ["data", "target"])
    train_d, test_d, train_t, test_t = cross_validation.train_test_split(dataset.data, dataset.target, test_size=test_size, random_state=0)

    left = DataSet(train_d, train_t)
    right = DataSet(test_d, test_t)
    
    return left, right

# use 30% of data to test the model
training_set, test_set = split_dataset(digits, 0.3)
print("dataset is splited to train/test = {0} -> {1}, {2}".format(
        len(digits.data), len(training_set.data), len(test_set.data))
     )

Since we have split the data into training_set and test_set above, modify Training the Model as follows.

classifier.fit(training_set.data, training_set.target)

Learning is now complete. Thanks to the split data, we have 30% of the data for evaluation. You can use it to measure the accuracy of untrained data.

Modify the accuracy calculation part of Evaluate the Model as follows.

print(calculate_accuracy(classifier, training_set))
print(calculate_accuracy(classifier, test_set))

# 3 Evaluate the model


Handson # 3 Explanation

Confirmation of accuracy for training / evaluation data

Check the transition of accuracy for training / evaluation data with the following script. This graph with the number of training data on the horizontal axis and the accuracy on the vertical axis is called the learning curver. In scikit-learn, it is sklearn.learning_curve. You can easily draw by using /modules/generated/sklearn.learning_curve.learning_curve.html).

def plot_learning_curve(model_func, dataset):
    from sklearn.learning_curve import learning_curve
    import matplotlib.pyplot as plt
    import numpy as np

    sizes = [i / 10 for i in range(1, 11)]
    train_sizes, train_scores, valid_scores = learning_curve(model_func(), dataset.data, dataset.target, train_sizes=sizes, cv=5)
    
    take_means = lambda s: np.mean(s, axis=1)
    plt.plot(sizes, take_means(train_scores), label="training")
    plt.plot(sizes, take_means(valid_scores), label="test")
    plt.ylim(0, 1.1)
    plt.title("learning curve")
    plt.legend(loc="lower right")
    plt.show()

plot_learning_curve(make_model, digits)

When you have finished adding it, try running it. The figure should be plotted as shown below.

image

Confirmation of conformance rate and recall rate

In scikit-learn, you can easily check by using the classification_report function. Confusion_matrix is an analysis of how many of the concrete predictions were correct within each label (# 0 to # 9). You can do this with sklearn.metrics.confusion_matrix.html # sklearn.metrics.confusion_matrix).

def show_confusion_matrix(model, dataset):
    from sklearn.metrics import classification_report
    
    predicted = model.predict(dataset.data)
    target_names = ["#{0}".format(i) for i in range(0, 10)]

    print(classification_report(dataset.target, predicted, target_names=target_names))

show_confusion_matrix(classifier, digits)

image

Handson Advanced

Deploy to Heroku

Try pressing the Heroku button.

image

By using conda-buildpack, you can build an application environment with conda on Heroku. This makes it easy to run machine learning applications on Heroku. Please refer to here for details.

Model tuning

Use GridSearch to find out which parameter has the highest accuracy while changing the parameters of the model. In scikit-learn, this search is possible by using GridSearchCV.

Please try tuning by inserting the following code before Evaluate the Model.

def tuning_model(model_func, dataset):
    from sklearn.grid_search import GridSearchCV
    
    candidates = [
        {"loss": ["hinge", "log"],
         "alpha": [1e-5, 1e-4, 1e-3]
        }]
    
    searcher = GridSearchCV(model_func(), candidates, cv=5, scoring="f1_weighted")
    searcher.fit(dataset.data, dataset.target)
    
    for params, mean_score, scores in sorted(searcher.grid_scores_, key=lambda s: s[1], reverse=True):
        print("%0.3f (+/-%0.03f) for %r" % (mean_score, scores.std() / 2, params))
    
    return searcher.best_estimator_
    
classifier = tuning_model(make_model, digits)

Online machine learning

This application is designed to give you the correct answer if the predicted numbers are different. Its value is stored as feedback.txt in the data folder and is used to train the model (https://github.com/icoxfog417/number_recognizer/blob/master/application/server.py#L43) ..

image

Again, please check how learning will change.

※Caution

Other

Recommended Posts

Tech-Circle Let's start application development using machine learning (self-study)
Application development using Azure Machine Learning
[Python3] Let's analyze data using machine learning! (Regression)
Build a machine learning application development environment with Python
WEB application development using django-Development 1-
WEB application development using Django [Django startup]
Machine learning algorithm (support vector machine application)
WEB application development using Django [Model definition]
Stock price forecast using machine learning (scikit-learn)
WEB application development using Django [Initial settings]
WEB application development using django-Development environment construction-
[Machine learning] LDA topic classification using scikit-learn
[Machine learning] FX prediction using decision trees
WEB application development using Django [Request processing]
[Machine learning] Supervised learning using kernel density estimation
WEB application development using Django [Template addition]
Application development using SQLite with Django (PTVS)
[Python] Web application design for machine learning
Stock price forecast using machine learning (regression)
Let's try neural machine translation using Transformer
Creating a development environment for machine learning
[Machine learning] Regression analysis using scikit learn
Machine learning
A story about simple machine learning using TensorFlow
Data supply tricks using deques in machine learning
WEB application development using Django [Admin screen creation]
[Machine learning] Supervised learning using kernel density estimation Part 2
[Machine learning] Supervised learning using kernel density estimation Part 3
Face image dataset sorting using machine learning model (# 3)
About the development contents of machine learning (Example)
Try using Jupyter Notebook of Azure Machine Learning
[Machine learning] Extract similar words mechanically using WordNet
Causal reasoning using machine learning (organization of causal reasoning methods)