Zundokokiyoshi with TensorFlow

(Addition) It is not TensorFlow, but I found that it was solved more splendidly by Chainer. Learn Zundokokiyoshi using LSTM

I feel like I missed the boom, but since I've been touching TensorFlow recently, I used TensorFlow to use __convolutional neural network (CNN, ConvNet: Convolutional neural network) __. I tried to solve Zundokokiyoshi using it. You no longer even have to spend human intelligence to solve Zundokokiyoshi.

The entire code is here.

Problem setting

First, we have to think about what the neural network can solve for Zundokokiyoshi.

For example, you can cut out 5 elements from a Zundcolist (a list in which "Zun" and "Doko" are randomly stored) and let them determine whether they are "Zun, Dung, Dung, Dung, Doco". It's too easy to set a problem and it's not interesting. You don't have to bring up a neural network, there are only $ 2 ^ 5 = 32 $ patterns, so you only need to learn all the patterns rather than machine learning.

That said, I couldn't think of a good way to pass an infinite Zund Collist as input to a neural network. So let's pass a Zund Collist with 100 elements and return the index where the first "Zun" of "Zun, Dung, Dung, Dung, Doco" appears for the first time. For example, the correct answer is to return 2 for a Zund Collist who says" Dung, Doco, Dung, Dung, Dung, Dung, Doco, ... ". Since the number of elements is 100, there can be solutions from 0 to 95. Also, if "Dung, Dung, Dung, Dung, Doko" does not appear, let's assume that 96 is returned. This is considered a 97 class identification problem that identifies classes from 0 to 96. In this case, there is a pattern of $ 2 ^ 100 $, so it is not possible to train all patterns. It is necessary to learn the judgment logic by machine learning.

To make it easier to handle with neural networks, the input Zundcolist represents "Zun" as -1.0," Doco "is 1.0, and the output is TensorFlow Tutorial. It is represented by one-hot vector as it is done in versions / 0.6.0 / tutorials / mnist / beginners /). In other words, for the Zundcolist "Zun, Doco, Dung, Dung, Dung, Dung, Doco, ...", [-1.0, 1.0, -1.0, -1.0, -1.0, -1.0, 1.0,. ..] is trained as input and[0.0, 0.0, 1.0, 0.0, ...]is trained as output.

Training data

Since the Zund Collist knows the relationship between input and output, training data can be generated infinitely. I'm glad that the part that collects learning data, which is one of the most difficult points in machine learning, has been cleared!

First, let's create a function make_zundoko_list that randomly generates a Zundcolist and a function solve_zundoko_list that solves it.

from random import choice

zun = -1.0
doko = 1.0
zun_length = 4
zundoko_length = zun_length + 1

def make_zundoko_list(length):
    return [choice([zun, doko]) for _ in range(length)]

def solve_zundoko_list(zundoko_list, index=0, zun_count=0):
    if len(zundoko_list) == 0:
        return index - zun_length
    elif zun_count >= zun_length and zundoko_list[0] == doko:
        return index - zun_length
    else:
        return solve_zundoko_list(zundoko_list[1:], index + 1, zun_count + 1 if zundoko_list[0] == zun else 0)

However, if randomly generated Zundcolist is used as training data, the output will be biased, which is not good. We want to train evenly from 0 to 96, so let's also create a function make_solved_zundoko_list that specifies a solution and generates a zundcolist.

def make_solved_zundoko_list(length, answer):
    zundoko_list = make_zundoko_list(length)
    while True:
        solved_answer = solve_zundoko_list(zundoko_list)
        if solved_answer >= answer:
            break
        zundoko_list[solved_answer] = doko
    if answer + zundoko_length <= length:
        zundoko_list[answer:answer + zundoko_length] = [zun for _ in range(zun_length)] + [doko]
    return zundoko_list

As mentioned above, the solution must be converted to a one-hot vector, so let's also create a function dense_to_one_hot for that.

def dense_to_one_hot(index, num_classes):
    return [(1.0 if i == index else 0.0) for i in range(num_classes)]

Use this to prepare training data and test data. Let's assume that there are 100,000 training data and 1000 test data.

Note that we use make_solved_zundoko_list for training data and make_zundoko_list for test data.

list_length = 100
num_classes = list_length - zundoko_length + 1 + 1

zundoko_lists = [make_solved_zundoko_list(list_length, i % num_classes) for i in range(100000)]
zundoko_answers = [dense_to_one_hot(solve_zundoko_list(z), num_classes) for z in zundoko_lists]

test_zundoko_lists = [make_zundoko_list(list_length) for _ in range(1000)]
test_zundoko_answers = [dense_to_one_hot(solve_zundoko_list(z), num_classes) for z in test_zundoko_lists]

Convolutional neural network

Here is the main issue. So far, I'm just messing around with Python.

Since Zundokokiyoshi wants to detect adjacent patterns ("Zun, Zun, Zun, Zun, Doko") in the list, __ Convolutional Neural Network (CNN) __ is suitable. TensorFlow's Second Tutorial is an example of CNN, so write it as a reference. Let's go.

CNN is often used for image recognition. Since the image is two-dimensional, two-dimensional convolution is performed in image recognition. However, since the Zund Collist is a one-dimensional data string, it is necessary to perform a one-dimensional convolution. The pattern I want to detect right now is "Dung, Dung, Dung, Dung, Doco", so it seems good to fold it with a kernel of size 5.

However, I searched TensorFlow's API Reference (https://www.tensorflow.org/versions/r0.7/api_docs/index.html) and found no function for one-dimensional convolution [^ 1]. ]. Since there is no help for it, the Zundcolist is reshape to 100 x 1, and the kernel is reshape to 5 x 1, and it is treated as a two-dimensional value. You can now use the 2D convolution conv2d to achieve a pseudo 1D convolution.

The part that convolves from the input layer is as follows.

x = tf.placeholder(tf.float32, [None, list_length])
x_reshaped = tf.reshape(x, [-1, list_length, 1, 1])
W1 = tf.Variable(tf.truncated_normal([5, 1, 1, 1], stddev=0.1))
b1 = tf.Variable(tf.truncated_normal([1], stddev=0.1))
h1_reshaped = tf.nn.relu(tf.nn.conv2d(x_reshaped, W1, strides=[1, 1, 1, 1], padding='SAME') + b1)
h1 = tf.reshape(h1_reshaped, [-1, list_length])

This convolution is expected to detect "dung, dung, dung, dung, doko" patterns. Normally, CNN does pooling after convolution, but this time it does not pool because the result of the convolution will directly indicate whether "dung, dung, dung, dung, doco" was detected. ..

By the way, even if you can detect the pattern of "Dung, Dung, Dung, Dung, Doco" by convolution, that is not enough. For example, if the input "Dung, Doco, Dung, Dung, Dung, Dung, Doco, Dung, Dung, Dung, Dung, Doco, ..." is passed, 7 will be detected in addition to 2. It is expected that it will end up. However, it is mechanically difficult to detect only 2 by convolution. So let's add another layer and get rid of values like 7.

Since it is difficult to express such a calculation by convolution, the next layer is fully connected. However, the calculation of removing only 7 while leaving 2 seems complicated. Does full connection (just multiply by a matrix) have that much expressive power? If you can't express the calculation you want to achieve with the formula, you can't learn it.

With 2 and 7 detected ([0, 0, 1, 0, 0, 0, 0, 1, 0, ...]), what kind of matrix should be applied to 2 Is it possible to get the result of detecting only ([0, 0, 1, 0, 0, 0, 0, 0, ...])? For example, if you apply the following matrix (from the right), it seems that you can skip to negative values except for the first detected "Zun, Dung, Dung, Dung, Doco". If you can separate the value you want to detect from the value you don't, the activation function will do the rest.

\left(\begin{matrix}
   1.0 &   -1.0 &   -1.0 & \cdots \\
   0.0 &    1.0 &   -1.0 & \cdots \\
   0.0 &    0.0 &    1.0 & \cdots \\
\vdots & \vdots & \vdots & \ddots \\
\end{matrix}\right)

Now we know that just multiplying the matrix is powerful enough to remove values like 7. The ability to express does not mean that learning will be successful, but let's apply a matrix of 100 × 97 (including bias and softmax) to output. Dropout is omitted because it seems to be incompatible due to the nature of the problem (neural network configuration?).

W2 = tf.Variable(tf.truncated_normal([list_length, num_classes], stddev=0.1))
b2 = tf.Variable(tf.truncated_normal([num_classes], stddev=0.1))
y = tf.nn.softmax(tf.matmul(h1, W2) + b2)

Learning

Then, as in the tutorial, optimize Cross entropy to minimize and find W1, b1, W2, b2.

y_ = tf.placeholder(tf.float32, [None, num_classes])

cross_entropy = -tf.reduce_sum(y_ * tf.log(tf.clip_by_value(y, 1e-10, 1.0)))

train_step = tf.train.GradientDescentOptimizer(1e-5).minimize(cross_entropy)

answer = tf.argmax(y, 1)

init = tf.initialize_all_variables()

sess = tf.Session()
sess.run(init)

for i in range(1000):
    sess.run(train_step, feed_dict={x: zundoko_lists, y_: zundoko_answers})

The details have been changed from the tutorial, such as learning with all the training data every step and using GradientDescentOptimizer. Regarding cross_entropy, I modified it by referring to this Answer on StackOverflow.

Result display

Finally, display Zundokokiyoshi on the trained neural network and you're done.

zundoko_list = make_zundoko_list(list_length)
zundoko_answer = sess.run(answer, feed_dict={x: [zundoko_list]})[0]
zundoko_string_list = ['Dung' if zundoko == zun else 'Doco' for zundoko in zundoko_list]
zundoko_string_list = zundoko_string_list[:min(zundoko_answer + zundoko_length, len(zundoko_string_list))]
for zundoko_string in zundoko_string_list:
    print(zundoko_string)
if zundoko_answer + zundoko_length == len(zundoko_string_list):
    print('Ki yo shi!')

Dung
Doco
Dung
Doco
Doco
Dung
Doco
Doco
Doco
Dung
Dung
Doco
Dung
Doco
Doco
Doco
Doco
Dung
Dung
Dung
Dung
Doco
Ki yo shi!

Oh! !! It was displayed properly!

However, after trying several times with slightly different parameters, the correct answer rate in the test data was only 98% at maximum. Before I did it, I thought that 100% could be achieved with such simple logic, but it's quite difficult. In the current configuration, I think it is difficult if all combinations (the index of the solution and the subsequent index that is ignored) are not included in the training data, so I can express more abstract features such as increasing the number of layers. Then it may have been good.

[^ 1]: I just did a quick search on conv, so maybe I just missed something.