After trying machine learning, I think the immediate problem is to prepare a data set for training. You'll have to test with MINIST or something first, then learn logical operations at best, and then wonder what to do next. Even if you want to classify images, it's not fun to do it with a dropped data set, and if you prepare your own images, labeling them is deadly troublesome.
So, I can prepare the data appropriately by myself, and try pie kneading conversion and pseudo-random number classification as a non-trivial classification.
The source is here. https://github.com/kaityo256/chainer_bakermap
The version of Chainer is 2.0.1.
Note: This article was written by a machine learning amateur.
The goals are as follows.
The data is given as a one-dimensional sequence $ \ {v_n } $. One is given by the Python standard random.random ()
, the other is only the initial value random.random ()
, and after that
v = 3.0 * v - int(3.0*v)
Give in. This is a so-called baker's map, which looks like a random number at first glance, but you can see the difference by plotting $ (v_n, v_ {n + 1}) $.
First, when standard random numbers are used, there is no particular structure.
On the other hand, in the case of pie kneading conversion, it is immediately obvious.
Will neural networks be able to see the difference between the two by learning? The problem. With this, it is easy to create teacher data, and adjusting the size is also very easy. Strictly speaking, standard random numbers also have periods and structures, but they should not be visible at all in the range of 200, so they should look random in that range.
For the time being, let's learn with this setting.
All the numbers were decided appropriately.
I think Chainer's first barrier (though not so much) is data preparation. For details, please refer to Separate article, but in short
numpy.float32
) (which will be an array of Numpy objects).numpy.int32
) (give one Numpy object)Then, input and output are set as x
, y
, respectively.
dataset = chainer.datasets.TupleDataset(x,y)
If so, it will be a dataset format that Chainer can eat.
It's not very long, so I'll post a module that creates data.
data.py
import random
import numpy as np
import chainer
def make_baker(n):
a = []
x = random.random()
for i in range(n):
x = x * 3.0
x = x - int(x)
a.append(x)
return a
def make_random(n):
a = []
for i in range(n):
a.append(random.random())
return a
def make_data(ndata,units):
data = []
for i in range(ndata):
a = make_baker(units)
data.append([a,0])
for i in range(ndata):
a = make_random(units)
data.append([a,1])
return data
def make_dataset(ndata,units):
data = make_data(ndata,units)
random.shuffle(data)
n = len(data)
xn = len(data[0][0])
x = np.empty((n,xn),dtype=np.float32)
y = np.empty(n,dtype=np.int32)
for i in range(n):
x[i] = np.asarray(data[i][0])
y[i] = data[i][1]
return chainer.datasets.TupleDataset(x,y)
def main():
dataset = make_dataset(2,3)
print(dataset)
if __name__ == '__main__':
random.seed(1)
np.random.seed(1)
main()
I don't think it's difficult to understand the contents. Once, create a data
that lists the (input, output) pairs, convert it to numpy format, and make it a dataset
. later
import data
units = 200
ndata = 10000
dataset = data.make_dataset(ndata,units)
If so, you get a dataset that Chainer can feed on. If you rewrite only the make_data
function properly, you should be able to handle any data.
First, I made a class that wraps Chainer's model appropriately. Like this.
model.py
import chainer
import chainer.functions as F
import chainer.links as L
import collections
import struct
from chainer import training
from chainer.training import extensions
class MLP(chainer.Chain):
def __init__(self, n_units, n_out):
super(MLP, self).__init__(
l1 = L.Linear(None, n_units),
l2 = L.Linear(None, n_out)
)
def __call__(self, x):
return self.l2(F.relu(self.l1(x)))
class Model:
def __init__(self,n_unit):
self.unit = n_unit
self.model = L.Classifier(MLP(n_unit, 2))
def load(self,filename):
chainer.serializers.load_npz(filename, self.model)
def save(self,filename):
chainer.serializers.save_npz(filename, self.model)
def predictor(self, x):
return self.model.predictor(x)
def get_model(self):
return self.model
def export(self,filename):
p = self.model.predictor
l1W = p.l1.W.data
l1b = p.l1.b.data
l2W = p.l2.W.data
l2b = p.l2.b.data
d = bytearray()
for v in l1W.reshape(l1W.size):
d += struct.pack('f',v)
for v in l1b:
d += struct.pack('f',v)
for v in l2W.reshape(l2W.size):
d += struct.pack('f',v)
for v in l2b:
d += struct.pack('f',v)
open(filename,'w').write(d)
It's written in a mess, but as a usage,
python
m = Model(units) #Model wrapper class creation
model = m.get_model() #Get model object(Used for training)
m.save("baker.model") #Save model(Serialize)
m.load("baker.model") #Model loading(Deserialize)
m.export("baker.dat") # C++Export for
Use as.
For learning, if you receive the model of the Model class, the rest is the Chainer sample as it is, so I think that there is no particular problem. For the time being, if you look at train.py, you can see that it is as it is. However, the model is serialized after learning.
The test.py
, which tests the model after training, looks like this.
test.py
from model import Model
import numpy as np
import random
import data
import math
def main():
ndata = 1000
unit = 200
model = Model(unit)
model.load("baker.model")
d = data.make_data(ndata,unit)
x = np.array([v[0] for v in d], dtype=np.float32)
y = model.predictor(x).data
r = [np.argmax(v) for v in y]
bs = sum(r[:ndata])
rs = sum(r[ndata:])
print("Check Baker")
print "Success/Fail",ndata-bs,"/",bs
print("Check Random")
print "Success/Fail",rs,"/",ndata-rs
def test():
unit = 200
model = Model(unit)
model.load("baker.model")
a = []
for i in range(unit):
a.append(0.5)
x = np.array([a], dtype=np.float32)
y = model.predictor(x).data
print(y)
if __name__ == '__main__':
random.seed(2)
np.random.seed(2)
test()
main()
I'm just creating an instance of the Model class, deserializing it, and testing it [^ 1]. The execution result looks like this.
[^ 1]: Looking back now, the function names such as main
and test
are not good, and I should have passed an instance of the Model class to each of them in the first place ...
$ python test.py
[[-0.84465003 0.10021734]]
Check Baker
Success/Fail 929 / 71
Check Random
Success/Fail 913 / 87
the first
[[-0.84465003 0.10021734]]
Outputs the weight when the data "200 pieces are all 0.5" is fed. This means that the weight recognized as 0, that is, the pie kneading conversion is -0.84465003
, and the weight recognized as a random number is 0.10021734
. In other words, when a constant is fed, it is recognized as random [^ 2]. This will be used later to check if the model loaded in C ++ is working properly.
[^ 2]: Probably, when the ratio of adjacent numbers tripled is high, it is recognized as a pie kneading transformation, so I think it is ok to recognize that a constant is not a pie kneading transformation.
After that
Check Baker
Success/Fail 929 / 71
The output is that when 1000 sets of pie kneading transformations were fed, 929 sets were recognized as pie kneading conversions and 71 sets were mistakenly recognized as random.
After that
Check Random
Success/Fail 913 / 87
Means that 1000 sets of random numbers were eaten and 913 sets were correctly recognized as random numbers.
See Separate article for exporting and importing to C ++. Exporting is left to the wrapper class, so it's easy.
export.py
from model import Model
def main():
unit = 200
model = Model(unit)
model.load("baker.model")
model.export("baker.dat")
if __name__ == '__main__':
main()
It reads baker.model
and spits out baker.dat
.
It's easy to import, but let's classify it for convenience later. Like this.
model.hpp
#pragma once
#include <iostream>
#include <fstream>
#include <vector>
#include <math.h>
#include <algorithm>
//------------------------------------------------------------------------
typedef std::vector<float> vf;
//------------------------------------------------------------------------
class Link {
private:
vf W;
vf b;
float relu(float x) {
return (x > 0) ? x : 0;
}
const int n_in, n_out;
public:
Link(int in, int out) : n_in(in), n_out(out) {
W.resize(n_in * n_out);
b.resize(n_out);
}
void read(std::ifstream &ifs) {
ifs.read((char*)W.data(), sizeof(float)*n_in * n_out);
ifs.read((char*)b.data(), sizeof(float)*n_out);
}
vf get(vf x) {
vf y(n_out);
for (int i = 0; i < n_out; i++) {
y[i] = 0.0;
for (int j = 0; j < n_in; j++) {
y[i] += W[i * n_in + j] * x[j];
}
y[i] += b[i];
}
return y;
}
vf get_relu(vf x) {
vf y = get(x);
for (int i = 0; i < n_out; i++) {
y[i] = relu(y[i]);
}
return y;
}
};
//------------------------------------------------------------------------
class Model {
private:
Link l1, l2;
public:
const int n_in, n_out;
Model(int in, int n_units, int out):
n_in(in), n_out(out),
l1(in, n_units), l2(n_units, out) {
}
void load(const char* filename) {
std::ifstream ifs(filename);
l1.read(ifs);
l2.read(ifs);
}
vf predict(vf &x) {
return l2.get(l1.get_relu(x));
}
int argmax(vf &x) {
vf y = predict(x);
auto it = std::max_element(y.begin(), y.end());
auto index = std::distance(y.begin(), it);
return index;
}
};
//------------------------------------------------------------------------
with this,
#include "model.hpp"
int
main(void){
const int n_in = 200;
const int n_units = 200;
const int n_out = 2;
Model model(n_in, n_units, n_out);
model.load("baker.dat");
}
You can load the model as.
First, try feeding the same thing and spitting out exactly the same weight.
Let's write this code.
void
test(Model &model) {
vf x;
for (int i = 0; i < model.n_in; i++) {
x.push_back(0.5);
}
vf y = model.predict(x);
printf("%f %f\n", y[0], y[1]);
}
However,
typedef std::vector<float> vf;
Is. The execution result is
-0.844650 0.100217
It turns out that it matches the result of Python properly.
In addition, check the correct answer rate when pie kneading conversion and random numbers are fed.
int
test_baker(Model &model) {
static std::mt19937 mt;
std::uniform_real_distribution<float> ud(0.0, 1.0);
vf x;
float v = ud(mt);
for (int i = 0; i < model.n_in; i++) {
x.push_back(v);
v = v * 3.0;
v = v - int(v);
}
return model.argmax(x);
}
//------------------------------------------------------------------------
int
test_random(Model &model) {
static std::mt19937 mt;
std::uniform_real_distribution<float> ud(0.0, 1.0);
vf x;
for (int i = 0; i < model.n_in; i++) {
x.push_back(ud(mt));
}
return model.argmax(x);
}
//------------------------------------------------------------------------
int
main(void) {
const int n_in = 200;
const int n_units = 200;
const int n_out = 2;
Model model(n_in, n_units, n_out);
model.load("baker.dat");
test(model);
const int TOTAL = 1000;
int bn = 0;
for (int i = 0; i < TOTAL; i++) {
bn += test_baker(model);
}
std::cout << "Check Baker" << std::endl;
std::cout << "Success/Fail:" << (TOTAL - bn) << "/" << bn << std::endl;
int rn = 0;
for (int i = 0; i < TOTAL; i++) {
rn += test_random(model);
}
std::cout << "Check Random" << std::endl;
std::cout << "Success/Fail:" << rn << "/" << (TOTAL - rn) << std::endl;
}
The result of each execution is like this.
Check Baker
Success/Fail:940/60
Check Random
Success/Fail:923/77
It seems that the correct answer rate is almost the same.
Using Chainer, I tried a test to distinguish the sequence obtained by the pie kneading transformation from the standard random number. I thought it would be easier to distinguish, but with 3 layers and 200 units / layer, is it something like this? For the time being, I was able to create a flow of learning with Python → using it with C ++, so I would like to apply it in various ways.
I'm sorry for the article I wrote.