Introduction

I want to do Deep Learning with pylearn2. But I'm not sure if I'm just doing the image discrimination in the tutorial. So, I'm going to make an AI for tic-tac-toe (so-called XX game). I really wanted to make an AI for Othello, but since this is my first time using pylearn2, I chose a simpler game. By the way, I am a beginner of machine learning and pylearn2. Please point out any mistakes. Skip the steps to install the required packages such as pylearn2 and numpy.

Neural network design

Since it is a tic-tac-toe AI, first number the squares on the 3x3 board to indicate the state on the board.

The input is a certain state of the 3x3 board, and the position of the next hand is the output. Therefore, consider the following network.

The numbers on the shoulders of the input and output circles are the numbers of the squares on the board. The input is 9 inputs, that is, the state of 9 squares. The hidden layer is one layer, and the activation function is the sigmoid function. Since the output is divided into 9 classes, we will use the softmax function. Here, "divide into 9 classes" means to classify a certain input (state of the board) according to which square the next move is hit. In other words, if it is classified into 0, the next move will be taken on the 0 square.

Creation of teaching data

The AI created this time does not pursue strength. If you can proceed with the game without deviating from the teaching data, we will divide it into a letter. Therefore, instead of collecting only the battle data that is doing the best, prepare a large amount of data that is simply competing according to the rules. In order to generate a large amount of battle data, I created a Ruby script that just puts it in a place where both the first move and the second move can be placed randomly. You might be told why you don't write in python, but just because you're tired of writing just python lately doesn't mean much. The first move is 〇 and the second move is ×. You can create a game record by specifying the number of loops in the argument and executing it. This is used as teaching data. The number is the number of the square you put, and the win and lose at the end indicate the victory or defeat of the first attack.

`tic_tac_toe.rb`


#!/usr/bin/env ruby
#
# tic tac toe
#
# 0 : O (first)
# 1 : X (second)
# 2 : none
#

def show_board(array)
  p array[0][0].to_s + "," + array[0][1].to_s + "," + array[0][2].to_s
  p array[1][0].to_s + "," + array[1][1].to_s + "," + array[1][2].to_s
  p array[2][0].to_s + "," + array[2][1].to_s + "," + array[2][2].to_s
  p ""
end

def judge(array)
  ret = 2
  for stone in [0, 1] do
    for i in [0 ,1, 2] do
      if (array[i][0]==stone && array[i][1]==stone && array[i][2]==stone) ||
         (array[0][i]==stone && array[1][i]==stone && array[2][i]==stone)   then
        ret = stone
      end
    end
   
    if (array[0][0]==stone && array[1][1]==stone && array[2][2]==stone) ||
       (array[0][2]==stone && array[1][1]==stone && array[2][0]==stone)   then
      ret = stone
    end
  end
  return ret
end

loop_max = ARGV[0].to_i
#p "loop max=" + loop_max.to_s
cell_array = [] 
stone_array = [] 

loop_cnt = 0
until loop_cnt >= loop_max do
  cell_array = Array.new(9)
  stone_array = Array.new(3).map { Array.new(3, 2) }
  9.times do |num|
    cell_array[num] = num
  end

  i = 10
  history = []
  9.times do |num|
    #p i
    rnd = rand(i) - 1
    if num % 2 == 0
      stone_array[cell_array[rnd].divmod(3)[0]][cell_array[rnd].divmod(3)[1]] = 0
    else
      stone_array[cell_array[rnd].divmod(3)[0]][cell_array[rnd].divmod(3)[1]] = 1
    end
    history.push(cell_array[rnd])
    #show_board(stone_array)
    ret = judge(stone_array)
    if ret == 0 then
      history.push("win")    # "O" is winner.
      break
    elsif ret == 1 then
      history.push("lose")   # "O" is loser.
      break
    end
    
    cell_array.delete_at(rnd)

    i -= 1
  end
  p history.join(",")
  loop_cnt += 1
end

$ ruby tic_tac_toe.rb 500 | tee tic-tac-toe_records.log
"6,8,3,4,2,0,lose"
"7,1,4,5,0,2,8,win"
"6,2,3,0,5,8,1,7,4,win"
"3,8,4,2,6,5,lose"
"1,8,2,7,3,0,6,4,lose"
"8,0,3,6,7,4,1,2,lose"
"6,8,3,5,2,7,4,win"
"2,1,7,4,3,5,8,6,0"
"4,8,6,7,1,3,2,win"
"6,1,3,0,8,7,5,4,lose"
"8,2,7,1,4,3,0,win"
"8,6,1,2,7,0,3,4,lose"
"4,3,8,1,2,6,7,0,lose"
"8,6,3,4,1,5,7,2,lose"
"1,2,0,4,7,8,5,6,lose"
"0,5,2,3,6,7,8,4,lose"
"7,1,2,6,4,5,0,3,8,win"
"2,1,0,8,3,5,7,4,6,win"
"2,0,8,5,6,7,4,win"
...

Process this game record data into csv so that it can be used easily as an input. This time, in order to make an AI that decides the second attack, only the ones that the second attack won are extracted.

$ awk '{gsub("\"","");print $0;}' tic-tac-toe_records.log | grep lose | tee tic-tac-toe_records_lose.csv

It will be like this after processing.

`tic-tac-toe_records_lose.csv`


6,8,3,4,2,0,lose
3,8,4,2,6,5,lose
1,8,2,7,3,0,6,4,lose
8,0,3,6,7,4,1,2,lose
...

Now you are ready to go.

MLP model using pylearn2

First, the source code is shown.

`tic_tac_toe.py`


#!/usr/bin/env python
# -*- cording: utf-8 -*-

import theano
from pylearn2.models import mlp
from pylearn2.training_algorithms import sgd
from pylearn2.termination_criteria import EpochCounter
from pylearn2.datasets.dense_design_matrix import DenseDesignMatrix
import numpy as np
import csv

class TicTacToe(DenseDesignMatrix):
    def __init__(self):
        X = []
        y = []
        X_temp = [0,0,0,0,0,0,0,0,0]  # 3x3 board
        y_temp = [0,0,0,0,0,0,0,0,0]  # 3x3 board

        # (1)
        self.class_names = ['0', '3']
        f = open("tic-tac-toe_records_lose.csv", "r")
        reader = csv.reader(f)

        # (2)
        for row in reader:
            for i, cell_index in enumerate(row):
                if cell_index == "win" or cell_index == "lose":
                    X_temp = [0,0,0,0,0,0,0,0,0]
                elif i % 2 == 0:
                    temp = []
                    X_temp[int(cell_index)] = 1
                    for x in X_temp:
                        temp.append(x)

                    #print "  temp = " + str(temp)
                    X.append(temp)
                else:
                    X_temp[int(cell_index)] = 2
                    y_temp[int(cell_index)] = 3
                    #print "y_temp = " + str(y_temp)
                    y.append(y_temp)
                    y_temp = [0,0,0,0,0,0,0,0,0]

        X = np.array(X)
        y = np.array(y)
        super(TicTacToe, self).__init__(X=X, y=y)

# (3)
data_set = TicTacToe()
h0 = mlp.Sigmoid(layer_name='h0', dim=9, irange=.1, init_bias=1.)
out = mlp.Softmax(layer_name='out', n_classes=9, irange=0.)
trainer = sgd.SGD(learning_rate=.05, batch_size=200, termination_criterion=EpochCounter(5000))
layers = [h0, out]

ann = mlp.MLP(layers, nvis=9)
trainer.setup(ann, data_set)

# (4)
while True:
    trainer.train(dataset=data_set)
    ann.monitor.report_epoch()
    ann.monitor()
    if trainer.continue_learning(ann) == False:
        break

# (5)-1
next_move = [0,0,0,0,0,0,0,0,0]
inputs = np.array([[0,0,1,0,0,0,0,0,0]])
output = ann.fprop(theano.shared(inputs, name='inputs')).eval()
print output[0]
for i in range(0,9):
    if max(output[0]) == output[0][i]:
        next_move[i] = 3

print next_move

# (5)-2
next_move = [0,0,0,0,0,0,0,0,0]
inputs = np.array([[1,0,2,1,0,0,0,1,2]])
output = ann.fprop(theano.shared(inputs, name='inputs')).eval()
print output[0]
for i in range(0,9):
    if max(output[0]) == output[0][i]:
        next_move[i] = 3

print next_move

I will explain a little. Please read it in correspondence with (1), (2) ... in the code. As a premise, 〇 is "1", × is "2", and a square with nothing is "0". (It is different from the ruby code used to create the teaching data. Sorry for the incomprehension.)

(1) Setting the output value The value to take as the output value. I'm using 0,1,2 for the input, so I decided to use 3 for the output. (2) Conversion of game records Corresponds the read game record to the input and output of the neural network. For example, if the input of the game record is "6,8,3,4,2,0, lose", the expansion will be as follows.

Game record: 6,8,3,4,2,0,lose
X[n]   = [0,0,0,0,0,0,1,0,0]  : "6"In the position of"〇"That is"1"Enter
y[n]   = [0,0,0,0,0,0,0,0,3]  :The next move is"8"In the position of"×"So"8"In the position of"3"Outputs
X[n+1] = [0,0,0,1,0,0,1,0,2]  : "8"In the position of"×"That is"2",next"3"In the position of"〇"That is"1"Enter
y[n+1] = [0,0,0,0,3,0,0,0,0]  :The next move is"4"In the position of"×"So"4"In the position of"3"Outputs
X[n+2] = [0,0,1,1,2,0,1,0,2]  : "4"In the position of"×"That is"2",next"2"In the position of"〇"That is"1"Enter
y[n+2] = [3,0,0,0,0,0,0,0,0]  :The next move is"0"In the position of"×"So"0"In the position of"3"Outputs

(3) MLP structure description The hidden layer h0 is generated with dim (dimension) = 9 (number of squares) using the sigmoid function as the activation function. irange and init_bias are suitable. The output layer out is generated with n_classes (number of classifications) = 9 (number of cells) using the softmax function as the activation function. irange is suitable. [Stochastic Gradient Descent](https://en.wikipedia.org/wiki/%E7%A2%BA%E7%8E%87%E7%9A%84%E5%8B%BE%E9%85%8D Train with% E9% 99% 8D% E4% B8% 8B% E6% B3% 95). learning_rate and batch_size are appropriate. termination_criterion specifies when to end training.

ann = mlp.MLP(layers, nvis=9)

The above specifies the structure of MLP. nvis has input dimension = 9 (number of squares). (4) Training It's a training loop. A monitor of the progress is also available. (5) Test Test what the output looks like for some inputs. The place with the highest probability (output [0]) of 9 classes is the next move (next_move).

The output when the program is executed is as follows.

$ python tic_tac_toe.py
Parameter and initial learning rate summary:
	h0_W: 0.05
	h0_b: 0.05
	softmax_b: 0.05
	softmax_W: 0.05
Compiling sgd_update...
Compiling sgd_update done. Time elapsed: 0.203131 seconds
compiling begin_record_entry...
compiling begin_record_entry done. Time elapsed: 0.003911 seconds
Monitored channels: 
Compiling accum...
Compiling accum done. Time elapsed: 0.000039 seconds
Monitoring step:
	Epochs seen: 1
	Batches seen: 3
	Examples seen: 542
Monitoring step:
	Epochs seen: 2
	Batches seen: 6
	Examples seen: 1084

...

Monitoring step:
	Epochs seen: 5000
	Batches seen: 15000
	Examples seen: 2710000
Monitoring step:
	Epochs seen: 5001
	Batches seen: 15003
	Examples seen: 2710542
[ 0.07985083  0.10700001  0.00255253  0.15781951  0.08504663  0.16470689
  0.11459433  0.12293593  0.16549335]
[0, 0, 0, 0, 0, 0, 0, 0, 3]
[  2.56981722e-03   1.25923571e-01   2.05250923e-04   6.14268028e-04
   1.85819252e-02   2.43921569e-02   8.27328217e-01   3.84348076e-04
   4.45556802e-07]
[0, 0, 0, 0, 0, 0, 3, 0, 0]

From the above, If the input is [0,0,1,0,0,0,0,0,0], the next move is [0,0,0,0,0,0,0,0,3]. In other words, AI hits as shown in the figure below.

If the input is [1,0,2,1,0,0,0,1,2], the next move is [0,0,0,0,0,0,3,0,0]. In other words, AI hits as shown in the figure below.

Well, if you hit "X" at the "5" position, you win and you're done, but you're going to stop the reach of "O". Well, for the fact that it is based on a random game record, is it okay?

It's a bit annoying because I have to mess with the source code every time, but I got the next move for arbitrary input.

In the future, based on the results of one training, I will try to implement it in a tic-tac-toe game so that I can easily obtain the next move for any input. First of all, I have to make a game program of tic-tac-toe. **I made. Please see the following article. ** ** Create a tic-tac-toe AI with Pylearn2-Save and load models-

reference

http://www.arngarden.com/2013/07/29/neural-network-example-using-pylearn2/ [http://sinhrks.hatenablog.com/entry/2014/11/30/085119] (http://sinhrks.hatenablog.com/entry/2014/11/30/085119) https://www.safaribooksonline.com/blog/2014/02/10/pylearn2-regression-3rd-party-data/

Let's make a tic-tac-toe AI with Pylearn 2