See through pie kneading conversion with Chainer

Introduction

After trying machine learning, I think the immediate problem is to prepare a data set for training. You'll have to test with MINIST or something first, then learn logical operations at best, and then wonder what to do next. Even if you want to classify images, it's not fun to do it with a dropped data set, and if you prepare your own images, labeling them is deadly troublesome.

So, I can prepare the data appropriately by myself, and try pie kneading conversion and pseudo-random number classification as a non-trivial classification.

The source is here. https://github.com/kaityo256/chainer_bakermap

The version of Chainer is 2.0.1.

Note: This article was written by a machine learning amateur.

Purpose

The goals are as follows.

data set

The data is given as a one-dimensional sequence $ \ {v_n } $. One is given by the Python standard random.random (), the other is only the initial value random.random (), and after that

v = 3.0 * v - int(3.0*v)

Give in. This is a so-called baker's map, which looks like a random number at first glance, but you can see the difference by plotting $ (v_n, v_ {n + 1}) $.

First, when standard random numbers are used, there is no particular structure.

rand.png

On the other hand, in the case of pie kneading conversion, it is immediately obvious.

baker.png

Will neural networks be able to see the difference between the two by learning? The problem. With this, it is easy to create teacher data, and adjusting the size is also very easy. Strictly speaking, standard random numbers also have periods and structures, but they should not be visible at all in the range of 200, so they should look random in that range.

Setting

For the time being, let's learn with this setting.

All the numbers were decided appropriately.

Data preparation

I think Chainer's first barrier (though not so much) is data preparation. For details, please refer to Separate article, but in short

Then, input and output are set as x, y, respectively.

dataset = chainer.datasets.TupleDataset(x,y)

If so, it will be a dataset format that Chainer can eat.

It's not very long, so I'll post a module that creates data.

data.py


import random
import numpy as np
import chainer

def make_baker(n):
    a = []
    x = random.random()
    for i in range(n):
        x = x * 3.0
        x = x - int(x) 
        a.append(x)
    return a

def make_random(n):
    a = []
    for i in range(n):
        a.append(random.random())
    return a

def make_data(ndata,units):
    data = []
    for i in range(ndata):
        a = make_baker(units)
        data.append([a,0])
    for i in range(ndata):
        a = make_random(units)
        data.append([a,1])
    return data

def make_dataset(ndata,units):
    data = make_data(ndata,units)
    random.shuffle(data)
    n = len(data)
    xn = len(data[0][0])
    x = np.empty((n,xn),dtype=np.float32)
    y = np.empty(n,dtype=np.int32)
    for i in range(n):
        x[i] = np.asarray(data[i][0])
        y[i] = data[i][1]
    return chainer.datasets.TupleDataset(x,y)

def main():
    dataset = make_dataset(2,3)
    print(dataset)

if __name__ == '__main__':
    random.seed(1)
    np.random.seed(1)
    main()

I don't think it's difficult to understand the contents. Once, create a data that lists the (input, output) pairs, convert it to numpy format, and make it a dataset. later

import data

units = 200
ndata = 10000
dataset = data.make_dataset(ndata,units)

If so, you get a dataset that Chainer can feed on. If you rewrite only the make_data function properly, you should be able to handle any data.

Model settings

First, I made a class that wraps Chainer's model appropriately. Like this.

model.py


import chainer
import chainer.functions as F
import chainer.links as L
import collections
import struct
from chainer import training
from chainer.training import extensions

class MLP(chainer.Chain):
    def __init__(self, n_units, n_out):
      super(MLP, self).__init__(
            l1 = L.Linear(None, n_units),
            l2 = L.Linear(None, n_out)
            )
    def __call__(self, x):
        return self.l2(F.relu(self.l1(x)))

class Model:
    def __init__(self,n_unit):
        self.unit = n_unit
        self.model = L.Classifier(MLP(n_unit, 2))
    def load(self,filename):
        chainer.serializers.load_npz(filename, self.model)
    def save(self,filename):
        chainer.serializers.save_npz(filename, self.model)
    def predictor(self, x):
        return self.model.predictor(x)
    def get_model(self):
        return self.model
    def export(self,filename):
        p = self.model.predictor
        l1W = p.l1.W.data
        l1b = p.l1.b.data
        l2W = p.l2.W.data
        l2b = p.l2.b.data
        d = bytearray()
        for v in l1W.reshape(l1W.size):
            d += struct.pack('f',v)
        for v in l1b:
            d += struct.pack('f',v)
        for v in l2W.reshape(l2W.size):
            d += struct.pack('f',v)
        for v in l2b:
            d += struct.pack('f',v)
        open(filename,'w').write(d)

It's written in a mess, but as a usage,

python


m = Model(units)      #Model wrapper class creation
model = m.get_model() #Get model object(Used for training)
m.save("baker.model") #Save model(Serialize)
m.load("baker.model") #Model loading(Deserialize)
m.export("baker.dat") # C++Export for

Use as.

Learning

For learning, if you receive the model of the Model class, the rest is the Chainer sample as it is, so I think that there is no particular problem. For the time being, if you look at train.py, you can see that it is as it is. However, the model is serialized after learning.

Post-learning test

The test.py, which tests the model after training, looks like this.

test.py


from model import Model
import numpy as np
import random
import data
import math

def main():
    ndata = 1000
    unit = 200
    model = Model(unit)
    model.load("baker.model")
    d = data.make_data(ndata,unit)
    x = np.array([v[0] for v in d], dtype=np.float32)
    y = model.predictor(x).data
    r = [np.argmax(v) for v in y]
    bs = sum(r[:ndata])
    rs = sum(r[ndata:])
    print("Check Baker")
    print "Success/Fail",ndata-bs,"/",bs
    print("Check Random")
    print "Success/Fail",rs,"/",ndata-rs

def test():
    unit = 200
    model = Model(unit)
    model.load("baker.model")
    a = []
    for i in range(unit):
        a.append(0.5)
    x = np.array([a], dtype=np.float32)
    y = model.predictor(x).data
    print(y)

if __name__ == '__main__':
    random.seed(2)
    np.random.seed(2)
    test()
    main()

I'm just creating an instance of the Model class, deserializing it, and testing it [^ 1]. The execution result looks like this.

[^ 1]: Looking back now, the function names such as main and test are not good, and I should have passed an instance of the Model class to each of them in the first place ...

$ python test.py
[[-0.84465003  0.10021734]]
Check Baker
Success/Fail 929 / 71
Check Random
Success/Fail 913 / 87

the first

[[-0.84465003  0.10021734]]

Outputs the weight when the data "200 pieces are all 0.5" is fed. This means that the weight recognized as 0, that is, the pie kneading conversion is -0.84465003, and the weight recognized as a random number is 0.10021734. In other words, when a constant is fed, it is recognized as random [^ 2]. This will be used later to check if the model loaded in C ++ is working properly.

[^ 2]: Probably, when the ratio of adjacent numbers tripled is high, it is recognized as a pie kneading transformation, so I think it is ok to recognize that a constant is not a pie kneading transformation.

After that

Check Baker
Success/Fail 929 / 71

The output is that when 1000 sets of pie kneading transformations were fed, 929 sets were recognized as pie kneading conversions and 71 sets were mistakenly recognized as random.

After that

Check Random
Success/Fail 913 / 87

Means that 1000 sets of random numbers were eaten and 913 sets were correctly recognized as random numbers.

Export + Import to C ++

See Separate article for exporting and importing to C ++. Exporting is left to the wrapper class, so it's easy.

export.py


from model import Model

def main():
    unit = 200
    model = Model(unit)
    model.load("baker.model")
    model.export("baker.dat")
    
if __name__ == '__main__':
    main()

It reads baker.model and spits out baker.dat.

It's easy to import, but let's classify it for convenience later. Like this.

model.hpp


#pragma once
#include <iostream>
#include <fstream>
#include <vector>
#include <math.h>
#include <algorithm>
//------------------------------------------------------------------------
typedef std::vector<float> vf;
//------------------------------------------------------------------------
class Link {
private:
  vf W;
  vf b;
  float relu(float x) {
    return (x > 0) ? x : 0;
  }
  const int n_in, n_out;
public:
  Link(int in, int out) : n_in(in), n_out(out) {
    W.resize(n_in * n_out);
    b.resize(n_out);
  }
  void read(std::ifstream &ifs) {
    ifs.read((char*)W.data(), sizeof(float)*n_in * n_out);
    ifs.read((char*)b.data(), sizeof(float)*n_out);
  }

  vf get(vf x) {
    vf y(n_out);
    for (int i = 0; i < n_out; i++) {
      y[i] = 0.0;
      for (int j = 0; j < n_in; j++) {
        y[i] += W[i * n_in + j] * x[j];
      }
      y[i] += b[i];
    }
    return y;
  }

  vf get_relu(vf x) {
    vf y = get(x);
    for (int i = 0; i < n_out; i++) {
      y[i] = relu(y[i]);
    }
    return y;
  }
};
//------------------------------------------------------------------------
class Model {
private:
  Link l1, l2;
public:
  const int n_in, n_out;
  Model(int in, int n_units, int out):
    n_in(in), n_out(out),
    l1(in, n_units), l2(n_units, out) {
  }
  void load(const char* filename) {
    std::ifstream ifs(filename);
    l1.read(ifs);
    l2.read(ifs);
  }
  vf predict(vf &x) {
    return l2.get(l1.get_relu(x));
  }
  int argmax(vf &x) {
    vf y = predict(x);
    auto it = std::max_element(y.begin(), y.end());
    auto index = std::distance(y.begin(), it);
    return index;
  }
};
//------------------------------------------------------------------------

with this,

#include "model.hpp"
int
main(void){
  const int n_in = 200;
  const int n_units = 200;
  const int n_out = 2;
  Model model(n_in, n_units, n_out);
  model.load("baker.dat");
}

You can load the model as.

Import test

First, try feeding the same thing and spitting out exactly the same weight.

Let's write this code.

void
test(Model &model) {
  vf x;
  for (int i = 0; i < model.n_in; i++) {
    x.push_back(0.5);
  }
  vf y = model.predict(x);
  printf("%f %f\n", y[0], y[1]);
}

However,

typedef std::vector<float> vf;

Is. The execution result is

-0.844650 0.100217

It turns out that it matches the result of Python properly.

In addition, check the correct answer rate when pie kneading conversion and random numbers are fed.

int
test_baker(Model &model) {
  static std::mt19937 mt;
  std::uniform_real_distribution<float> ud(0.0, 1.0);
  vf x;
  float v = ud(mt);
  for (int i = 0; i < model.n_in; i++) {
    x.push_back(v);
    v = v * 3.0;
    v = v - int(v);
  }
  return model.argmax(x);
}
//------------------------------------------------------------------------
int
test_random(Model &model) {
  static std::mt19937 mt;
  std::uniform_real_distribution<float> ud(0.0, 1.0);
  vf x;
  for (int i = 0; i < model.n_in; i++) {
    x.push_back(ud(mt));
  }
  return model.argmax(x);
}
//------------------------------------------------------------------------
int
main(void) {
  const int n_in = 200;
  const int n_units = 200;
  const int n_out = 2;
  Model model(n_in, n_units, n_out);
  model.load("baker.dat");
  test(model);
  const int TOTAL = 1000;
  int bn = 0;
  for (int i = 0; i < TOTAL; i++) {
    bn += test_baker(model);
  }
  std::cout << "Check Baker" << std::endl;
  std::cout << "Success/Fail:" << (TOTAL - bn) << "/" << bn << std::endl;
  int rn = 0;
  for (int i = 0; i < TOTAL; i++) {
    rn += test_random(model);
  }
  std::cout << "Check Random" << std::endl;
  std::cout << "Success/Fail:" << rn << "/" << (TOTAL - rn) << std::endl;
}

The result of each execution is like this.

Check Baker
Success/Fail:940/60
Check Random
Success/Fail:923/77

It seems that the correct answer rate is almost the same.

Summary

Using Chainer, I tried a test to distinguish the sequence obtained by the pie kneading transformation from the standard random number. I thought it would be easier to distinguish, but with 3 layers and 200 units / layer, is it something like this? For the time being, I was able to create a flow of learning with Python → using it with C ++, so I would like to apply it in various ways.

reference

I'm sorry for the article I wrote.

Recommended Posts

See through pie kneading conversion with Chainer
Seq2Seq (1) with chainer
Use tensorboard with Chainer
Let's move word2vec with Chainer and see the learning progress