Use Pylearn2 for handwriting recognition. I will omit the installation method of Pylearn2. Set the environment variable PYLEARN2_VIEWER_COMMAND to display the image.

The source code used this time has been uploaded to Github. https://github.com/dsanno/pylearn2_mnist

Download data

For data, use MNIST database. It has the following data set.

One data is a 28 x 28 pixel black and white image with one of the numbers from 0 to 9 drawn.
60,000 learning data
10000 test data

pylearn2 contains scripts that download and process some datasets. To download the MNIST database, run the following file included in pylearn2. The downloaded data will be placed in $ PYLEARN2_DATA_PATH / mnist. pylearn2/scripts/datasets/download_mnist.py

Check the data

Let's check what kind of data is included.

First, create a yaml file that defines the dataset.

`dataset.yaml`


!obj:pylearn2.datasets.mnist.MNIST {
    which_set: 'train'
}

Then use show_examples.py in pylearn2 to display a sample of the data. Create and execute the following file.

`show_samples.py`


import pylearn2.scripts.show_examples as show_examples

show_examples.show_examples('dataset.yaml', 20, 20)

Alternatively, it can be displayed with the following command. pylearn2/scripts/show_examples.py dataset.yaml

The following image will be displayed.

Define a model

Define a model for training. There is a model for training MNIST data in the stacked_autoencoders directory of the tutorial, so modify it and use it. pylearn2/scripts/tutorals/stacked_autoencoders/

This time, we will define the following model.

Input is 28 x 28 = 784 units
Output is 10 units
3 hidden layers
100 units each for hidden layers

1st layer: 28 x 28 = Takes 784 input values.

`dae_l1.yaml`


!obj:pylearn2.train.Train {
    dataset: &train !obj:pylearn2.datasets.mnist.MNIST {
        which_set: 'train',
        start: 0,
        stop: %(train_stop)i
    },
    model: !obj:pylearn2.models.autoencoder.DenoisingAutoencoder {
        nvis : 784,
        nhid : %(nhid)i,
        irange : 0.05,
        corruptor: !obj:pylearn2.corruption.BinomialCorruptor {
            corruption_level: .2,
        },
        act_enc: "tanh",
        act_dec: null,    # Linear activation on the decoder side.
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        learning_rate : 1e-3,
        batch_size : %(batch_size)i,
        monitoring_batches : %(monitoring_batches)i,
        monitoring_dataset : *train,
        cost : !obj:pylearn2.costs.autoencoder.MeanSquaredReconstructionError {},
        termination_criterion : !obj:pylearn2.termination_criteria.EpochCounter {
            max_epochs: %(max_epochs)i,
        },
    },
    save_path: "%(save_path)s/dae_l1.pkl",
    save_freq: 1
}

Second layer: For the dataset, use the training data converted by the first layer.

`dae_l2.yaml`


!obj:pylearn2.train.Train {
    dataset: &train !obj:pylearn2.datasets.transformer_dataset.TransformerDataset {
        raw: !obj:pylearn2.datasets.mnist.MNIST {
            which_set: 'train',
            start: 0,
            stop: %(train_stop)i
        },
        transformer: !pkl: "%(save_path)s/dae_l1.pkl"
    },
    model: !obj:pylearn2.models.autoencoder.DenoisingAutoencoder {
        nvis : %(nvis)i,
        nhid : %(nhid)i,
        irange : 0.05,
        corruptor: !obj:pylearn2.corruption.BinomialCorruptor {
            corruption_level: .3,
        },
        act_enc: "tanh",
        act_dec: null,    # Linear activation on the decoder side.
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        learning_rate : 1e-3,
        batch_size : %(batch_size)i,
        monitoring_batches : %(monitoring_batches)i,
        monitoring_dataset : *train,
        cost : !obj:pylearn2.costs.autoencoder.MeanSquaredReconstructionError {},
        termination_criterion : !obj:pylearn2.termination_criteria.EpochCounter {
            max_epochs: %(max_epochs)i,
        },
    },
    save_path: "%(save_path)s/dae_l2.pkl",
    save_freq: 1
}

Third layer: For the dataset, use the training data converted by the first and second layers.

!obj:pylearn2.train.Train {
    dataset: &train !obj:pylearn2.datasets.transformer_dataset.TransformerDataset {
        raw: !obj:pylearn2.datasets.mnist.MNIST {
            which_set: 'train',
            start: 0,
            stop: %(train_stop)i
        },
        transformer: !obj:pylearn2.blocks.StackedBlocks {
            layers: [!pkl: "dae_l1.pkl", !pkl: "dae_l2.pkl"]
        }
    },
    model: !obj:pylearn2.models.autoencoder.DenoisingAutoencoder {
        nvis : %(nvis)i,
        nhid : %(nhid)i,
        irange : 0.05,
        corruptor: !obj:pylearn2.corruption.BinomialCorruptor {
            corruption_level: .3,
        },
        act_enc: "tanh",
        act_dec: null,    # Linear activation on the decoder side.
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        learning_rate : 1e-3,
        batch_size : %(batch_size)i,
        monitoring_batches : %(monitoring_batches)i,
        monitoring_dataset : *train,
        cost : !obj:pylearn2.costs.autoencoder.MeanSquaredReconstructionError {},
        termination_criterion : !obj:pylearn2.termination_criteria.EpochCounter {
            max_epochs: %(max_epochs)i,
        },
    },
    save_path: "%(save_path)s/dae_l3.pkl",
    save_freq: 1
}

Finally, define a model in which each layer is concatenated for fine tuning. There are 10 units in the output layer, and each value can be regarded as the probability of which character from 0 to 9.

`dae_mlp.yaml`


!obj:pylearn2.train.Train {
    dataset: &train !obj:pylearn2.datasets.mnist.MNIST {
        which_set: 'train',
        start: 0,
        stop: %(train_stop)i
    },
    model: !obj:pylearn2.models.mlp.MLP {
        batch_size: %(batch_size)i,
        layers: [
                 !obj:pylearn2.models.mlp.PretrainedLayer {
                     layer_name: 'h1',
                     layer_content: !pkl: "%(save_path)s/dae_l1.pkl"
                 },
                 !obj:pylearn2.models.mlp.PretrainedLayer {
                     layer_name: 'h2',
                     layer_content: !pkl: "%(save_path)s/dae_l2.pkl"
                 },
                 !obj:pylearn2.models.mlp.PretrainedLayer {
                     layer_name: 'h3',
                     layer_content: !pkl: "%(save_path)s/dae_l3.pkl"
                 },
                 !obj:pylearn2.models.mlp.Softmax {
                     max_col_norm: 1.9365,
                     layer_name: 'y',
                     n_classes: 10,
                     irange: .005
                 }
                ],
        nvis: 784
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        learning_rate: .05,
        learning_rule: !obj:pylearn2.training_algorithms.learning_rule.Momentum {
            init_momentum: .5,
        },
        monitoring_dataset:
            {
                'valid' : !obj:pylearn2.datasets.mnist.MNIST {
                              which_set: 'train',
                              start: 0,
                              stop: %(valid_stop)i
                          },
            },
        cost: !obj:pylearn2.costs.mlp.Default {},
        termination_criterion: !obj:pylearn2.termination_criteria.And {
            criteria: [
                !obj:pylearn2.termination_criteria.MonitorBased {
                    channel_name: "valid_y_misclass",
                    prop_decrease: 0.,
                    N: 100
                },
                !obj:pylearn2.termination_criteria.EpochCounter {
                    max_epochs: %(max_epochs)i
                }
            ]
        },
        update_callbacks: !obj:pylearn2.training_algorithms.sgd.ExponentialDecay {
            decay_factor: 1.00004,
            min_lr: .000001
        }
    },
    extensions: [
        !obj:pylearn2.training_algorithms.learning_rule.MomentumAdjustor {
            start: 1,
            saturate: 250,
            final_momentum: .7
        }
    ],
    save_path: "%(save_path)s/dae_mlp.pkl",
    save_freq: 1
}

Do learning

Modify and use the learning script included in the stacked_autoencoders tutorial. The following modifications have been made.

In the tutorial, the hidden layer was changed from 2 layers to 3 layers.
Fixed that the number of training data (train_stop) and the maximum number of trainings (max_epochs) are small, probably to shorten the learning time. All 60,000 learning data were used, and the maximum number of learning times was 100.

When executed, the files corresponding to each model, dae_l1.pkl, dae_l2.pkl, dae_l3.pkl, and dae_mlp.pkl, are output. Regarding the execution time, it took about 20 minutes in my environment (Core i7-3770).

`test_dae.py`


import os

from pylearn2.testing import skip
from pylearn2.testing import no_debug_mode
from pylearn2.config import yaml_parse


@no_debug_mode
def train_yaml(yaml_file):

    train = yaml_parse.load(yaml_file)
    train.main_loop()


def train_layer1(yaml_file_path, save_path):

    yaml = open("{0}/dae_l1.yaml".format(yaml_file_path), 'r').read()
    hyper_params = {'train_stop': 60000,
                    'batch_size': 100,
                    'monitoring_batches': 1,
                    'nhid': 100,
                    'max_epochs': 100,
                    'save_path': save_path}
    yaml = yaml % (hyper_params)
    train_yaml(yaml)


def train_layer2(yaml_file_path, save_path):

    yaml = open("{0}/dae_l2.yaml".format(yaml_file_path), 'r').read()
    hyper_params = {'train_stop': 60000,
                    'batch_size': 100,
                    'monitoring_batches': 1,
                    'nvis': 100,
                    'nhid': 100,
                    'max_epochs': 100,
                    'save_path': save_path}
    yaml = yaml % (hyper_params)
    train_yaml(yaml)


def train_layer3(yaml_file_path, save_path):

    yaml = open("{0}/dae_l3.yaml".format(yaml_file_path), 'r').read()
    hyper_params = {'train_stop': 60000,
                    'batch_size': 100,
                    'monitoring_batches': 1,
                    'nvis': 100,
                    'nhid': 100,
                    'max_epochs': 100,
                    'save_path': save_path}
    yaml = yaml % (hyper_params)
    train_yaml(yaml)


def train_mlp(yaml_file_path, save_path):

    yaml = open("{0}/dae_mlp.yaml".format(yaml_file_path), 'r').read()
    hyper_params = {'train_stop': 60000,
                    'valid_stop': 60000,
                    'batch_size': 100,
                    'max_epochs': 100,
                    'save_path': save_path}
    yaml = yaml % (hyper_params)
    train_yaml(yaml)


def test_sda():

    skip.skip_if_no_data()

    yaml_file_path = '.';
    save_path = '.'

    train_layer1(yaml_file_path, save_path)
    train_layer2(yaml_file_path, save_path)
    train_layer3(yaml_file_path, save_path)
    train_mlp(yaml_file_path, save_path)

if __name__ == '__main__':
    test_sda()

Perform character recognition using test data

Character recognition is performed using test data to obtain the recognition rate. I get the test data with pylearn2.datasets.mnist.MNIST (which_set ='test') and use the model's fprop to find the output layer value. The character corresponding to the output unit with the largest value is used as the predicted value. In my environment, 9814 out of 10000 were correct.

`test_result.py`


import numpy as np
import pickle
import theano
import pylearn2.datasets.mnist as mnist


def simulate(inputs, model):
    return model.fprop(theano.shared(inputs)).eval()

def countCorrectResults(outputs, labels):
    correct = 0;
    for output, label in zip(outputs, labels):
        if np.argmax(output) == label:
            correct += 1
    return correct
 
def score(dataset, model):
    outputs = simulate(dataset.X, model)
    correct = countCorrectResults(outputs, dataset.y)

    return {
        'correct': correct,
        'total': len(dataset.X)
    }

model = pickle.load(open('dae_mlp.pkl'))
test_data = mnist.MNIST(which_set='test')
print '%(correct)d / %(total)d' % score(test_data, model)

References

I referred to the following site. http://tanopy.blog79.fc2.com/blog-entry-118.html http://www.slideshare.net/yurieoka37/ss-28152060

Perform handwriting recognition using Pylearn2

Download data

Check the data

dataset.yaml

show_samples.py

Define a model

dae_l1.yaml

dae_l2.yaml

dae_mlp.yaml