Perform handwriting recognition using Pylearn2

Use Pylearn2 for handwriting recognition. I will omit the installation method of Pylearn2. Set the environment variable PYLEARN2_VIEWER_COMMAND to display the image.

The source code used this time has been uploaded to Github. https://github.com/dsanno/pylearn2_mnist

Download data

For data, use MNIST database. It has the following data set.

pylearn2 contains scripts that download and process some datasets. To download the MNIST database, run the following file included in pylearn2. The downloaded data will be placed in $ PYLEARN2_DATA_PATH / mnist. pylearn2/scripts/datasets/download_mnist.py

Check the data

Let's check what kind of data is included.

First, create a yaml file that defines the dataset.

dataset.yaml


!obj:pylearn2.datasets.mnist.MNIST {
    which_set: 'train'
}

Then use show_examples.py in pylearn2 to display a sample of the data. Create and execute the following file.

show_samples.py


import pylearn2.scripts.show_examples as show_examples

show_examples.show_examples('dataset.yaml', 20, 20)

Alternatively, it can be displayed with the following command. pylearn2/scripts/show_examples.py dataset.yaml

The following image will be displayed.

dataset_examples.png

Define a model

Define a model for training. There is a model for training MNIST data in the stacked_autoencoders directory of the tutorial, so modify it and use it. pylearn2/scripts/tutorals/stacked_autoencoders/

This time, we will define the following model.

1st layer: 28 x 28 = Takes 784 input values.

dae_l1.yaml


!obj:pylearn2.train.Train {
    dataset: &train !obj:pylearn2.datasets.mnist.MNIST {
        which_set: 'train',
        start: 0,
        stop: %(train_stop)i
    },
    model: !obj:pylearn2.models.autoencoder.DenoisingAutoencoder {
        nvis : 784,
        nhid : %(nhid)i,
        irange : 0.05,
        corruptor: !obj:pylearn2.corruption.BinomialCorruptor {
            corruption_level: .2,
        },
        act_enc: "tanh",
        act_dec: null,    # Linear activation on the decoder side.
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        learning_rate : 1e-3,
        batch_size : %(batch_size)i,
        monitoring_batches : %(monitoring_batches)i,
        monitoring_dataset : *train,
        cost : !obj:pylearn2.costs.autoencoder.MeanSquaredReconstructionError {},
        termination_criterion : !obj:pylearn2.termination_criteria.EpochCounter {
            max_epochs: %(max_epochs)i,
        },
    },
    save_path: "%(save_path)s/dae_l1.pkl",
    save_freq: 1
}

Second layer: For the dataset, use the training data converted by the first layer.

dae_l2.yaml


!obj:pylearn2.train.Train {
    dataset: &train !obj:pylearn2.datasets.transformer_dataset.TransformerDataset {
        raw: !obj:pylearn2.datasets.mnist.MNIST {
            which_set: 'train',
            start: 0,
            stop: %(train_stop)i
        },
        transformer: !pkl: "%(save_path)s/dae_l1.pkl"
    },
    model: !obj:pylearn2.models.autoencoder.DenoisingAutoencoder {
        nvis : %(nvis)i,
        nhid : %(nhid)i,
        irange : 0.05,
        corruptor: !obj:pylearn2.corruption.BinomialCorruptor {
            corruption_level: .3,
        },
        act_enc: "tanh",
        act_dec: null,    # Linear activation on the decoder side.
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        learning_rate : 1e-3,
        batch_size : %(batch_size)i,
        monitoring_batches : %(monitoring_batches)i,
        monitoring_dataset : *train,
        cost : !obj:pylearn2.costs.autoencoder.MeanSquaredReconstructionError {},
        termination_criterion : !obj:pylearn2.termination_criteria.EpochCounter {
            max_epochs: %(max_epochs)i,
        },
    },
    save_path: "%(save_path)s/dae_l2.pkl",
    save_freq: 1
}

Third layer: For the dataset, use the training data converted by the first and second layers.

!obj:pylearn2.train.Train {
    dataset: &train !obj:pylearn2.datasets.transformer_dataset.TransformerDataset {
        raw: !obj:pylearn2.datasets.mnist.MNIST {
            which_set: 'train',
            start: 0,
            stop: %(train_stop)i
        },
        transformer: !obj:pylearn2.blocks.StackedBlocks {
            layers: [!pkl: "dae_l1.pkl", !pkl: "dae_l2.pkl"]
        }
    },
    model: !obj:pylearn2.models.autoencoder.DenoisingAutoencoder {
        nvis : %(nvis)i,
        nhid : %(nhid)i,
        irange : 0.05,
        corruptor: !obj:pylearn2.corruption.BinomialCorruptor {
            corruption_level: .3,
        },
        act_enc: "tanh",
        act_dec: null,    # Linear activation on the decoder side.
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        learning_rate : 1e-3,
        batch_size : %(batch_size)i,
        monitoring_batches : %(monitoring_batches)i,
        monitoring_dataset : *train,
        cost : !obj:pylearn2.costs.autoencoder.MeanSquaredReconstructionError {},
        termination_criterion : !obj:pylearn2.termination_criteria.EpochCounter {
            max_epochs: %(max_epochs)i,
        },
    },
    save_path: "%(save_path)s/dae_l3.pkl",
    save_freq: 1
}

Finally, define a model in which each layer is concatenated for fine tuning. There are 10 units in the output layer, and each value can be regarded as the probability of which character from 0 to 9.

dae_mlp.yaml


!obj:pylearn2.train.Train {
    dataset: &train !obj:pylearn2.datasets.mnist.MNIST {
        which_set: 'train',
        start: 0,
        stop: %(train_stop)i
    },
    model: !obj:pylearn2.models.mlp.MLP {
        batch_size: %(batch_size)i,
        layers: [
                 !obj:pylearn2.models.mlp.PretrainedLayer {
                     layer_name: 'h1',
                     layer_content: !pkl: "%(save_path)s/dae_l1.pkl"
                 },
                 !obj:pylearn2.models.mlp.PretrainedLayer {
                     layer_name: 'h2',
                     layer_content: !pkl: "%(save_path)s/dae_l2.pkl"
                 },
                 !obj:pylearn2.models.mlp.PretrainedLayer {
                     layer_name: 'h3',
                     layer_content: !pkl: "%(save_path)s/dae_l3.pkl"
                 },
                 !obj:pylearn2.models.mlp.Softmax {
                     max_col_norm: 1.9365,
                     layer_name: 'y',
                     n_classes: 10,
                     irange: .005
                 }
                ],
        nvis: 784
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        learning_rate: .05,
        learning_rule: !obj:pylearn2.training_algorithms.learning_rule.Momentum {
            init_momentum: .5,
        },
        monitoring_dataset:
            {
                'valid' : !obj:pylearn2.datasets.mnist.MNIST {
                              which_set: 'train',
                              start: 0,
                              stop: %(valid_stop)i
                          },
            },
        cost: !obj:pylearn2.costs.mlp.Default {},
        termination_criterion: !obj:pylearn2.termination_criteria.And {
            criteria: [
                !obj:pylearn2.termination_criteria.MonitorBased {
                    channel_name: "valid_y_misclass",
                    prop_decrease: 0.,
                    N: 100
                },
                !obj:pylearn2.termination_criteria.EpochCounter {
                    max_epochs: %(max_epochs)i
                }
            ]
        },
        update_callbacks: !obj:pylearn2.training_algorithms.sgd.ExponentialDecay {
            decay_factor: 1.00004,
            min_lr: .000001
        }
    },
    extensions: [
        !obj:pylearn2.training_algorithms.learning_rule.MomentumAdjustor {
            start: 1,
            saturate: 250,
            final_momentum: .7
        }
    ],
    save_path: "%(save_path)s/dae_mlp.pkl",
    save_freq: 1
}

Do learning

Modify and use the learning script included in the stacked_autoencoders tutorial. The following modifications have been made.

When executed, the files corresponding to each model, dae_l1.pkl, dae_l2.pkl, dae_l3.pkl, and dae_mlp.pkl, are output. Regarding the execution time, it took about 20 minutes in my environment (Core i7-3770).

test_dae.py


import os

from pylearn2.testing import skip
from pylearn2.testing import no_debug_mode
from pylearn2.config import yaml_parse


@no_debug_mode
def train_yaml(yaml_file):

    train = yaml_parse.load(yaml_file)
    train.main_loop()


def train_layer1(yaml_file_path, save_path):

    yaml = open("{0}/dae_l1.yaml".format(yaml_file_path), 'r').read()
    hyper_params = {'train_stop': 60000,
                    'batch_size': 100,
                    'monitoring_batches': 1,
                    'nhid': 100,
                    'max_epochs': 100,
                    'save_path': save_path}
    yaml = yaml % (hyper_params)
    train_yaml(yaml)


def train_layer2(yaml_file_path, save_path):

    yaml = open("{0}/dae_l2.yaml".format(yaml_file_path), 'r').read()
    hyper_params = {'train_stop': 60000,
                    'batch_size': 100,
                    'monitoring_batches': 1,
                    'nvis': 100,
                    'nhid': 100,
                    'max_epochs': 100,
                    'save_path': save_path}
    yaml = yaml % (hyper_params)
    train_yaml(yaml)


def train_layer3(yaml_file_path, save_path):

    yaml = open("{0}/dae_l3.yaml".format(yaml_file_path), 'r').read()
    hyper_params = {'train_stop': 60000,
                    'batch_size': 100,
                    'monitoring_batches': 1,
                    'nvis': 100,
                    'nhid': 100,
                    'max_epochs': 100,
                    'save_path': save_path}
    yaml = yaml % (hyper_params)
    train_yaml(yaml)


def train_mlp(yaml_file_path, save_path):

    yaml = open("{0}/dae_mlp.yaml".format(yaml_file_path), 'r').read()
    hyper_params = {'train_stop': 60000,
                    'valid_stop': 60000,
                    'batch_size': 100,
                    'max_epochs': 100,
                    'save_path': save_path}
    yaml = yaml % (hyper_params)
    train_yaml(yaml)


def test_sda():

    skip.skip_if_no_data()

    yaml_file_path = '.';
    save_path = '.'

    train_layer1(yaml_file_path, save_path)
    train_layer2(yaml_file_path, save_path)
    train_layer3(yaml_file_path, save_path)
    train_mlp(yaml_file_path, save_path)

if __name__ == '__main__':
    test_sda()

Perform character recognition using test data

Character recognition is performed using test data to obtain the recognition rate. I get the test data with pylearn2.datasets.mnist.MNIST (which_set ='test') and use the model's fprop to find the output layer value. The character corresponding to the output unit with the largest value is used as the predicted value. In my environment, 9814 out of 10000 were correct.

test_result.py


import numpy as np
import pickle
import theano
import pylearn2.datasets.mnist as mnist


def simulate(inputs, model):
    return model.fprop(theano.shared(inputs)).eval()

def countCorrectResults(outputs, labels):
    correct = 0;
    for output, label in zip(outputs, labels):
        if np.argmax(output) == label:
            correct += 1
    return correct
 
def score(dataset, model):
    outputs = simulate(dataset.X, model)
    correct = countCorrectResults(outputs, dataset.y)

    return {
        'correct': correct,
        'total': len(dataset.X)
    }

model = pickle.load(open('dae_mlp.pkl'))
test_data = mnist.MNIST(which_set='test')
print '%(correct)d / %(total)d' % score(test_data, model)

References

I referred to the following site. http://tanopy.blog79.fc2.com/blog-entry-118.html http://www.slideshare.net/yurieoka37/ss-28152060

Recommended Posts

Perform handwriting recognition using Pylearn2
Handwriting recognition using KNN in Python
Try using GCP Handwriting Recognition (OCR)
Age recognition using Pepper's API
I tried face recognition using Face ++
I tried handwriting recognition of runes with CNN using Keras
Face recognition using principal component analysis
Facial expression recognition using Pepper's API
Image recognition of fruits using VGG16
Circular object recognition using Hough transform