Try to write code from 1 using the machine learning framework chainer (mnist edition)

Introduction

There are many materials on the internet about the basics and mechanism of machine learning, but let's actually write the code! I think it's hard to get a handle on it. Especially chainer and tensorflow are useful, but I don't understand at all even if I install them, and some people try to move the example and quit without moving. In addition, the imagenet that comes with chainer in the image recognition example may not understand what you are trying to read the code. So, first of all, I decided to write this article for the purpose of implementing ** the simplest example of handwriting recognition mnist from scratch using chainer and understanding the mechanism and how to write code **. I've recently started studying chainer as a hobby, but I'm an amateur, so my purpose is to check my understanding by writing articles. In addition, I think that here will be helpful for the mechanism and study of the neural network itself.

Difficulty of chainer

Until I touched chainer, I often did full scratch using the basic C language, and I was always aware that the program I was writing was "input ○, output ×". Therefore, I suffered from the sample code that the code for inputting data, the ordering and storage of data are hidden, and the output is only the information of the model after training (laugh). Since I learned it, I thought it would be like if there was some input and output. It's a punishment that hesitated to read the library.

Advance preparation

Create an environment where python can be executed. In my environment, I have installed python3.5 using pyenv. Try installing chainer with pip or something. For reference

chainer==1.13.0

I am using. First, you have to get the input data and read it as a vector. chainer/example/mnist/ You can find the code in data.py below, so copy it to your development directory. As a magic again

chainer_test.py


import numpy as np
import chainer
from chainer import cuda, Function, gradient_check, report, training, utils, Variable
from chainer import datasets, iterators, optimizers, serializers
from chainer import Link, Chain, ChainList
import chainer.functions as F
import chainer.links as L
from chainer.training import extensions

import data

Please write in the source code. I will add it to this source file below.

Implementation of MNIST

MNIST NN configuration

There are four network configurations (input, hidden layer 1, hidden layer 2, output). The respective dimensions are 784, 100, 100, 10. That's because the input is 28 * 28 px. As for the output, 10 dimensions are used to express 0-9.

chainer_test.py


class MLP(Chain):
	def __init__(self):
		super(MLP, self).__init__(
			l1=L.Linear(784, 100),
			l2=L.Linear(100, 100),
			l3=L.Linear(100, 10),
		)

	def __call__(self, x):
		h1 = F.relu(self.l1(x))
		h2 = F.relu(self.l2(h1))
		y = self.l3(h2)
		return y

It is described as. First, about init, which defines the layer structure. It feels like the vector dimension changes as 784-> 100, 100-> 100, 100-> 10. Next, call is the state of propagation in each layer, that is, the calculation of forward. By convention, the firing function is propagated to the output of the stratum corneum in the neural network, but this is prepared in advance by chainer, and F.relu () is used here. Of course there should be tanh and so on.

In addition, I will create a function called Predict, which is important to me personally. The reason is that it is nice to train and see the loss rate, but when you use the trained neural network yourself, it is difficult to understand what the API is after all. There may be other good ways, but I'll implement predict myself.

chainer_test.py


def predict(model, x_data):
	x = Variable(x_data.astype(np.float32))
	y = model.predictor(x)
	return np.argmax(y.data, axis = 1)

This means that the trained model (described later) and the input vector (784 dimensions * N (number of vectors)) are used as arguments, the input vector is set to float32, the input is input to the predictor, the output is output (dimension 10), and the maximum is obtained. Returns an index containing a number. And so on. This will give you an easy-to-understand output.

Data preparation

Prepare the data.

chainer_test.py


batchsize = 100
datasize = 60000
N = 10000


mnist = data.load_mnist_data()
x_all = mnist['data'].astype(np.float32) / 255
y_all = mnist['target'].astype(np.int32)
x_train, x_test = np.split(x_all, [datasize])
y_train, y_test = np.split(y_all, [datasize])

batch size and N will be described later. Store the data with data.load_mnist_data (). Next, mnist ['data'], ['target'] retrieves the input and its classification. Usually, instead of using all the prepared data for training, it is common to train to some extent and test the rest. This time too, we have prepared two, x_traint and x_test.

Preparing for learning

Create a model in preparation for learning.

chainer_test.py



model = L.Classifier(MLP())
optimizer = optimizers.Adam()
optimizer.setup(model)

In a classification problem like this one, L.Classifier (), which implements loss function calculation and error report, is prepared in advance, so give it the class defined above and output it as a model. The optimizer is an excellent tool that automatically sets good parameters based on mathematical methods.

About learning and testing

I will actually start learning.

chainer_test.py


for epoch in range(20):
	print('epoch % d' % epoch)
	indexes = np.random.permutation(datasize)
	sum_loss, sum_accuracy = 0, 0
	for i in range(0, datasize, batchsize):
		x = Variable(np.asarray(x_train[indexes[i : i + batchsize]]))
		t = Variable(np.asarray(y_train[indexes[i : i + batchsize]]))
		optimizer.update(model, x, t)
		sum_loss += float(model.loss.data) * batchsize
		sum_accuracy += float(model.accuracy.data) * batchsize
	print('train mean loss={}, accuracy={}'.format(sum_loss / datasize, sum_accuracy / datasize))


	sum_loss, sum_accuracy = 0, 0
	for i in range(0, N, batchsize):
		x = Variable(np.asarray(x_test[i : i + batchsize]),volatile='on')
		t = Variable(np.asarray(y_test[i : i + batchsize]),volatile='on')
		loss = model(x, t)
		sum_loss += float(loss.data) * batchsize
		sum_accuracy += float(model.accuracy.data) * batchsize
	print('test mean loss={}, accuracy={}'.format(sum_loss / N, sum_accuracy / N))

epoch is how many times learning is repeated. This time, we will train and test every epoch. At the time of learning, the array is rearranged appropriately, and 0 ~ batchsize is repeated datasize / batchsize times. The actual test is created from i with an index of i + batchsize (input, correct answer data) = (x, t), and trained by optimizer (model, x, t). After that, repeat this operation and output the average loss rate and accuracy. The test is almost the same.

About forecasting & saving models

Give some vector to the trained model and see if the answer is correct.

chainer_test.py


p_test = np.empty((0, 784), float)
p_test = np.append(p_test, np.array([x_test[0]]), axis=0)
p_test = np.append(p_test, np.array([x_test[1]]), axis=0)


print(p_test)
print(predict(model, p_test))
print(y_test)

serializers.save_hdf5('myMLP.model',model)

p_test is the vector of the input you want to try. This time, it was troublesome to prepare by myself, so I used the 0th and 1st vectors used in the test as vector data. You may want to play around with some of the values. When I actually try to use it, the first two of y_test appear as the return value of predict, so this means that the learning is successful (I think).

The last line writes the model as a file. You can now reuse it.

in conclusion

Implemented chainer mnist from scratch. The Japanese translation site was very helpful for this test. Personally, I'm surprised at how easy it is to write a chainer class. In the future, I would like to use my spare time to write a source code explanation of the library itself and an implementation of imagenet from scratch.

Recommended Posts

Try to write code from 1 using the machine learning framework chainer (mnist edition)
[Introduction to machine learning] Until you run the sample code with chainer
Try using the Python web framework Django (1)-From installation to server startup
Machine learning python code summary (updated from time to time)
[Machine learning] Try to detect objects using Selective Search
I tried to compress the image using machine learning
Try to evaluate the performance of machine learning / regression model
Try to evaluate the performance of machine learning / classification model
Reinforcement learning 8 Try using Chainer UI
[Machine learning] Understand from mathematics why the correlation coefficient ranges from -1 to 1.
First python ② Try to write code while examining the features of python
[Machine learning] I will explain while trying the deep learning framework Chainer.
Write an impression of Deep Learning 3 framework edition made from scratch
Try using the web application framework Flask
Record the steps to understand machine learning
Aiming to become a machine learning engineer from sales positions using MOOCs
How to build an application from the cloud using the Django web framework
Try to predict the value of the water level gauge by machine learning using the open data of Data City Sabae
I tried to execute Python code from .Net using Pythonnet (Hallo World edition)
Try to forecast power demand by machine learning
Notes on machine learning (updated from time to time)
Machine learning algorithms (from two-class classification to multi-class classification)
Try using the Python web framework Tornado Part 1
Try using the Python web framework Tornado Part 2
Try using Jupyter Notebook of Azure Machine Learning
Python Machine Learning Programming Chapter 1 Gives Computers the Ability to Learn from Data Summary
Explanation of Chat Bot announced at PyCon 2016 from the code base (chat response using Chainer)
How to write a GUI using the maya command
Try setting SSH (Exscript) from the software to the router
Try setting NETCONF (ncclient) from software to the router
Python learning memo for machine learning by Chainer from Chapter 2
Try to predict forex (FX) with non-deep machine learning
I tried to approximate the sin function using chainer
Write data to KINTONE using the Python requests module
Machine learning beginners try to make a decision tree
Pip the machine learning library from one end (Ubuntu)
I read the Chainer reference (updated from time to time)
An introduction to machine learning from a simple perceptron
I tried to compare the accuracy of machine learning models using kaggle as a theme.
Introduction to machine learning
[Part 4] Use Deep Learning to forecast the weather from weather images
How to write custom validations in the Django REST Framework
[Part 1] Use Deep Learning to forecast the weather from weather images
[Part 3] Use Deep Learning to forecast the weather from weather images
Try using the Python web framework Django (2) --Look at setting.py
Machine learning beginners try to reach out to Naive Bayes (2) --Implementation
How to increase the number of machine learning dataset images
[Part 2] Use Deep Learning to forecast the weather from weather images
[TF] I tried to visualize the learning result using Tensorboard
[Machine learning] I tried to summarize the theory of Adaboost
Try to predict if tweets will burn with machine learning
Try to model a multimodal distribution using the EM algorithm
[Introduction to Reinforcement Learning] Reinforcement learning to try moving for the time being
I tried to approximate the sin function using chainer (re-challenge)
Machine learning beginners try to reach out to Naive Bayes (1) --Theory
Using open data from Data City Sabae to predict water level gauge values by machine learning Part 2
Introduction to Deep Learning for the first time (Chainer) Japanese character recognition Chapter 3 [Character recognition using a model]