There are many materials on the internet about the basics and mechanism of machine learning, but let's actually write the code! I think it's hard to get a handle on it. Especially chainer and tensorflow are useful, but I don't understand at all even if I install them, and some people try to move the example and quit without moving. In addition, the imagenet that comes with chainer in the image recognition example may not understand what you are trying to read the code. So, first of all, I decided to write this article for the purpose of implementing ** the simplest example of handwriting recognition mnist from scratch using chainer and understanding the mechanism and how to write code **. I've recently started studying chainer as a hobby, but I'm an amateur, so my purpose is to check my understanding by writing articles. In addition, I think that here will be helpful for the mechanism and study of the neural network itself.
Until I touched chainer, I often did full scratch using the basic C language, and I was always aware that the program I was writing was "input ○, output ×". Therefore, I suffered from the sample code that the code for inputting data, the ordering and storage of data are hidden, and the output is only the information of the model after training (laugh). Since I learned it, I thought it would be like if there was some input and output. It's a punishment that hesitated to read the library.
Create an environment where python can be executed. In my environment, I have installed python3.5 using pyenv. Try installing chainer with pip or something. For reference
chainer==1.13.0
I am using. First, you have to get the input data and read it as a vector. chainer/example/mnist/ You can find the code in data.py below, so copy it to your development directory. As a magic again
chainer_test.py
import numpy as np
import chainer
from chainer import cuda, Function, gradient_check, report, training, utils, Variable
from chainer import datasets, iterators, optimizers, serializers
from chainer import Link, Chain, ChainList
import chainer.functions as F
import chainer.links as L
from chainer.training import extensions
import data
Please write in the source code. I will add it to this source file below.
There are four network configurations (input, hidden layer 1, hidden layer 2, output). The respective dimensions are 784, 100, 100, 10. That's because the input is 28 * 28 px. As for the output, 10 dimensions are used to express 0-9.
chainer_test.py
class MLP(Chain):
def __init__(self):
super(MLP, self).__init__(
l1=L.Linear(784, 100),
l2=L.Linear(100, 100),
l3=L.Linear(100, 10),
)
def __call__(self, x):
h1 = F.relu(self.l1(x))
h2 = F.relu(self.l2(h1))
y = self.l3(h2)
return y
It is described as. First, about init, which defines the layer structure. It feels like the vector dimension changes as 784-> 100, 100-> 100, 100-> 10. Next, call is the state of propagation in each layer, that is, the calculation of forward. By convention, the firing function is propagated to the output of the stratum corneum in the neural network, but this is prepared in advance by chainer, and F.relu () is used here. Of course there should be tanh and so on.
In addition, I will create a function called Predict, which is important to me personally. The reason is that it is nice to train and see the loss rate, but when you use the trained neural network yourself, it is difficult to understand what the API is after all. There may be other good ways, but I'll implement predict myself.
chainer_test.py
def predict(model, x_data):
x = Variable(x_data.astype(np.float32))
y = model.predictor(x)
return np.argmax(y.data, axis = 1)
This means that the trained model (described later) and the input vector (784 dimensions * N (number of vectors)) are used as arguments, the input vector is set to float32, the input is input to the predictor, the output is output (dimension 10), and the maximum is obtained. Returns an index containing a number. And so on. This will give you an easy-to-understand output.
Prepare the data.
chainer_test.py
batchsize = 100
datasize = 60000
N = 10000
mnist = data.load_mnist_data()
x_all = mnist['data'].astype(np.float32) / 255
y_all = mnist['target'].astype(np.int32)
x_train, x_test = np.split(x_all, [datasize])
y_train, y_test = np.split(y_all, [datasize])
batch size and N will be described later. Store the data with data.load_mnist_data (). Next, mnist ['data'], ['target'] retrieves the input and its classification. Usually, instead of using all the prepared data for training, it is common to train to some extent and test the rest. This time too, we have prepared two, x_traint and x_test.
Create a model in preparation for learning.
chainer_test.py
model = L.Classifier(MLP())
optimizer = optimizers.Adam()
optimizer.setup(model)
In a classification problem like this one, L.Classifier (), which implements loss function calculation and error report, is prepared in advance, so give it the class defined above and output it as a model. The optimizer is an excellent tool that automatically sets good parameters based on mathematical methods.
I will actually start learning.
chainer_test.py
for epoch in range(20):
print('epoch % d' % epoch)
indexes = np.random.permutation(datasize)
sum_loss, sum_accuracy = 0, 0
for i in range(0, datasize, batchsize):
x = Variable(np.asarray(x_train[indexes[i : i + batchsize]]))
t = Variable(np.asarray(y_train[indexes[i : i + batchsize]]))
optimizer.update(model, x, t)
sum_loss += float(model.loss.data) * batchsize
sum_accuracy += float(model.accuracy.data) * batchsize
print('train mean loss={}, accuracy={}'.format(sum_loss / datasize, sum_accuracy / datasize))
sum_loss, sum_accuracy = 0, 0
for i in range(0, N, batchsize):
x = Variable(np.asarray(x_test[i : i + batchsize]),volatile='on')
t = Variable(np.asarray(y_test[i : i + batchsize]),volatile='on')
loss = model(x, t)
sum_loss += float(loss.data) * batchsize
sum_accuracy += float(model.accuracy.data) * batchsize
print('test mean loss={}, accuracy={}'.format(sum_loss / N, sum_accuracy / N))
epoch is how many times learning is repeated. This time, we will train and test every epoch. At the time of learning, the array is rearranged appropriately, and 0 ~ batchsize is repeated datasize / batchsize times. The actual test is created from i with an index of i + batchsize (input, correct answer data) = (x, t), and trained by optimizer (model, x, t). After that, repeat this operation and output the average loss rate and accuracy. The test is almost the same.
Give some vector to the trained model and see if the answer is correct.
chainer_test.py
p_test = np.empty((0, 784), float)
p_test = np.append(p_test, np.array([x_test[0]]), axis=0)
p_test = np.append(p_test, np.array([x_test[1]]), axis=0)
print(p_test)
print(predict(model, p_test))
print(y_test)
serializers.save_hdf5('myMLP.model',model)
p_test is the vector of the input you want to try. This time, it was troublesome to prepare by myself, so I used the 0th and 1st vectors used in the test as vector data. You may want to play around with some of the values. When I actually try to use it, the first two of y_test appear as the return value of predict, so this means that the learning is successful (I think).
The last line writes the model as a file. You can now reuse it.
Implemented chainer mnist from scratch. The Japanese translation site was very helpful for this test. Personally, I'm surprised at how easy it is to write a chainer class. In the future, I would like to use my spare time to write a source code explanation of the library itself and an implementation of imagenet from scratch.