Use Pylearn2 for handwriting recognition. I will omit the installation method of Pylearn2. Set the environment variable PYLEARN2_VIEWER_COMMAND to display the image.
The source code used this time has been uploaded to Github. https://github.com/dsanno/pylearn2_mnist
For data, use MNIST database. It has the following data set.
pylearn2 contains scripts that download and process some datasets. To download the MNIST database, run the following file included in pylearn2. The downloaded data will be placed in $ PYLEARN2_DATA_PATH / mnist. pylearn2/scripts/datasets/download_mnist.py
Let's check what kind of data is included.
First, create a yaml file that defines the dataset.
dataset.yaml
!obj:pylearn2.datasets.mnist.MNIST {
which_set: 'train'
}
Then use show_examples.py in pylearn2 to display a sample of the data. Create and execute the following file.
show_samples.py
import pylearn2.scripts.show_examples as show_examples
show_examples.show_examples('dataset.yaml', 20, 20)
Alternatively, it can be displayed with the following command. pylearn2/scripts/show_examples.py dataset.yaml
The following image will be displayed.
Define a model for training. There is a model for training MNIST data in the stacked_autoencoders directory of the tutorial, so modify it and use it. pylearn2/scripts/tutorals/stacked_autoencoders/
This time, we will define the following model.
1st layer: 28 x 28 = Takes 784 input values.
dae_l1.yaml
!obj:pylearn2.train.Train {
dataset: &train !obj:pylearn2.datasets.mnist.MNIST {
which_set: 'train',
start: 0,
stop: %(train_stop)i
},
model: !obj:pylearn2.models.autoencoder.DenoisingAutoencoder {
nvis : 784,
nhid : %(nhid)i,
irange : 0.05,
corruptor: !obj:pylearn2.corruption.BinomialCorruptor {
corruption_level: .2,
},
act_enc: "tanh",
act_dec: null, # Linear activation on the decoder side.
},
algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
learning_rate : 1e-3,
batch_size : %(batch_size)i,
monitoring_batches : %(monitoring_batches)i,
monitoring_dataset : *train,
cost : !obj:pylearn2.costs.autoencoder.MeanSquaredReconstructionError {},
termination_criterion : !obj:pylearn2.termination_criteria.EpochCounter {
max_epochs: %(max_epochs)i,
},
},
save_path: "%(save_path)s/dae_l1.pkl",
save_freq: 1
}
Second layer: For the dataset, use the training data converted by the first layer.
dae_l2.yaml
!obj:pylearn2.train.Train {
dataset: &train !obj:pylearn2.datasets.transformer_dataset.TransformerDataset {
raw: !obj:pylearn2.datasets.mnist.MNIST {
which_set: 'train',
start: 0,
stop: %(train_stop)i
},
transformer: !pkl: "%(save_path)s/dae_l1.pkl"
},
model: !obj:pylearn2.models.autoencoder.DenoisingAutoencoder {
nvis : %(nvis)i,
nhid : %(nhid)i,
irange : 0.05,
corruptor: !obj:pylearn2.corruption.BinomialCorruptor {
corruption_level: .3,
},
act_enc: "tanh",
act_dec: null, # Linear activation on the decoder side.
},
algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
learning_rate : 1e-3,
batch_size : %(batch_size)i,
monitoring_batches : %(monitoring_batches)i,
monitoring_dataset : *train,
cost : !obj:pylearn2.costs.autoencoder.MeanSquaredReconstructionError {},
termination_criterion : !obj:pylearn2.termination_criteria.EpochCounter {
max_epochs: %(max_epochs)i,
},
},
save_path: "%(save_path)s/dae_l2.pkl",
save_freq: 1
}
Third layer: For the dataset, use the training data converted by the first and second layers.
!obj:pylearn2.train.Train {
dataset: &train !obj:pylearn2.datasets.transformer_dataset.TransformerDataset {
raw: !obj:pylearn2.datasets.mnist.MNIST {
which_set: 'train',
start: 0,
stop: %(train_stop)i
},
transformer: !obj:pylearn2.blocks.StackedBlocks {
layers: [!pkl: "dae_l1.pkl", !pkl: "dae_l2.pkl"]
}
},
model: !obj:pylearn2.models.autoencoder.DenoisingAutoencoder {
nvis : %(nvis)i,
nhid : %(nhid)i,
irange : 0.05,
corruptor: !obj:pylearn2.corruption.BinomialCorruptor {
corruption_level: .3,
},
act_enc: "tanh",
act_dec: null, # Linear activation on the decoder side.
},
algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
learning_rate : 1e-3,
batch_size : %(batch_size)i,
monitoring_batches : %(monitoring_batches)i,
monitoring_dataset : *train,
cost : !obj:pylearn2.costs.autoencoder.MeanSquaredReconstructionError {},
termination_criterion : !obj:pylearn2.termination_criteria.EpochCounter {
max_epochs: %(max_epochs)i,
},
},
save_path: "%(save_path)s/dae_l3.pkl",
save_freq: 1
}
Finally, define a model in which each layer is concatenated for fine tuning. There are 10 units in the output layer, and each value can be regarded as the probability of which character from 0 to 9.
dae_mlp.yaml
!obj:pylearn2.train.Train {
dataset: &train !obj:pylearn2.datasets.mnist.MNIST {
which_set: 'train',
start: 0,
stop: %(train_stop)i
},
model: !obj:pylearn2.models.mlp.MLP {
batch_size: %(batch_size)i,
layers: [
!obj:pylearn2.models.mlp.PretrainedLayer {
layer_name: 'h1',
layer_content: !pkl: "%(save_path)s/dae_l1.pkl"
},
!obj:pylearn2.models.mlp.PretrainedLayer {
layer_name: 'h2',
layer_content: !pkl: "%(save_path)s/dae_l2.pkl"
},
!obj:pylearn2.models.mlp.PretrainedLayer {
layer_name: 'h3',
layer_content: !pkl: "%(save_path)s/dae_l3.pkl"
},
!obj:pylearn2.models.mlp.Softmax {
max_col_norm: 1.9365,
layer_name: 'y',
n_classes: 10,
irange: .005
}
],
nvis: 784
},
algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
learning_rate: .05,
learning_rule: !obj:pylearn2.training_algorithms.learning_rule.Momentum {
init_momentum: .5,
},
monitoring_dataset:
{
'valid' : !obj:pylearn2.datasets.mnist.MNIST {
which_set: 'train',
start: 0,
stop: %(valid_stop)i
},
},
cost: !obj:pylearn2.costs.mlp.Default {},
termination_criterion: !obj:pylearn2.termination_criteria.And {
criteria: [
!obj:pylearn2.termination_criteria.MonitorBased {
channel_name: "valid_y_misclass",
prop_decrease: 0.,
N: 100
},
!obj:pylearn2.termination_criteria.EpochCounter {
max_epochs: %(max_epochs)i
}
]
},
update_callbacks: !obj:pylearn2.training_algorithms.sgd.ExponentialDecay {
decay_factor: 1.00004,
min_lr: .000001
}
},
extensions: [
!obj:pylearn2.training_algorithms.learning_rule.MomentumAdjustor {
start: 1,
saturate: 250,
final_momentum: .7
}
],
save_path: "%(save_path)s/dae_mlp.pkl",
save_freq: 1
}
Modify and use the learning script included in the stacked_autoencoders tutorial. The following modifications have been made.
When executed, the files corresponding to each model, dae_l1.pkl, dae_l2.pkl, dae_l3.pkl, and dae_mlp.pkl, are output. Regarding the execution time, it took about 20 minutes in my environment (Core i7-3770).
test_dae.py
import os
from pylearn2.testing import skip
from pylearn2.testing import no_debug_mode
from pylearn2.config import yaml_parse
@no_debug_mode
def train_yaml(yaml_file):
train = yaml_parse.load(yaml_file)
train.main_loop()
def train_layer1(yaml_file_path, save_path):
yaml = open("{0}/dae_l1.yaml".format(yaml_file_path), 'r').read()
hyper_params = {'train_stop': 60000,
'batch_size': 100,
'monitoring_batches': 1,
'nhid': 100,
'max_epochs': 100,
'save_path': save_path}
yaml = yaml % (hyper_params)
train_yaml(yaml)
def train_layer2(yaml_file_path, save_path):
yaml = open("{0}/dae_l2.yaml".format(yaml_file_path), 'r').read()
hyper_params = {'train_stop': 60000,
'batch_size': 100,
'monitoring_batches': 1,
'nvis': 100,
'nhid': 100,
'max_epochs': 100,
'save_path': save_path}
yaml = yaml % (hyper_params)
train_yaml(yaml)
def train_layer3(yaml_file_path, save_path):
yaml = open("{0}/dae_l3.yaml".format(yaml_file_path), 'r').read()
hyper_params = {'train_stop': 60000,
'batch_size': 100,
'monitoring_batches': 1,
'nvis': 100,
'nhid': 100,
'max_epochs': 100,
'save_path': save_path}
yaml = yaml % (hyper_params)
train_yaml(yaml)
def train_mlp(yaml_file_path, save_path):
yaml = open("{0}/dae_mlp.yaml".format(yaml_file_path), 'r').read()
hyper_params = {'train_stop': 60000,
'valid_stop': 60000,
'batch_size': 100,
'max_epochs': 100,
'save_path': save_path}
yaml = yaml % (hyper_params)
train_yaml(yaml)
def test_sda():
skip.skip_if_no_data()
yaml_file_path = '.';
save_path = '.'
train_layer1(yaml_file_path, save_path)
train_layer2(yaml_file_path, save_path)
train_layer3(yaml_file_path, save_path)
train_mlp(yaml_file_path, save_path)
if __name__ == '__main__':
test_sda()
Character recognition is performed using test data to obtain the recognition rate. I get the test data with pylearn2.datasets.mnist.MNIST (which_set ='test') and use the model's fprop to find the output layer value. The character corresponding to the output unit with the largest value is used as the predicted value. In my environment, 9814 out of 10000 were correct.
test_result.py
import numpy as np
import pickle
import theano
import pylearn2.datasets.mnist as mnist
def simulate(inputs, model):
return model.fprop(theano.shared(inputs)).eval()
def countCorrectResults(outputs, labels):
correct = 0;
for output, label in zip(outputs, labels):
if np.argmax(output) == label:
correct += 1
return correct
def score(dataset, model):
outputs = simulate(dataset.X, model)
correct = countCorrectResults(outputs, dataset.y)
return {
'correct': correct,
'total': len(dataset.X)
}
model = pickle.load(open('dae_mlp.pkl'))
test_data = mnist.MNIST(which_set='test')
print '%(correct)d / %(total)d' % score(test_data, model)
I referred to the following site. http://tanopy.blog79.fc2.com/blog-entry-118.html http://www.slideshare.net/yurieoka37/ss-28152060
Recommended Posts