Introduction

Previously, I wrote an article "Let's make a tic-tac-toe AI with Pylearn 2". At this time, I made the problem simple and tic-tac-toe, but this time I will challenge the original goal of creating Othello's AI with a neural network. Please forgive it though it may be an article with many trials and errors. This time, I will use Chainer as a library for deep learning.

The article consists of two parts.

Part 1 (this article)
Conversion of teacher data
MLP design
Model training and storage
Part 2
Implementation in Othello games
Check if it is playable (can you play the game without deviating from the rules)
If you find that it is not playable here, go back to creating the MLP model

We will implement AI in the second part, but if it does not work, we will have to reconsider from the configuration of the neural network ... ** → Operation confirmed and conclusion **

Preparation of teacher data

I borrowed the game record data from here. A large amount of battle data stored in ggf format. It is readable in text format.

MLP design

First, decide the input and output. Since Othello's AI will be created, the input will be the state of the board and the next move will be the output. It is the following image when written in the figure.

Specifically, I will drop it into a neural network. The input is the state of the board, that is, an 8 x 8 vector. The elements of the vector are '0': none, '1': black, and '2': white. There are 64 types of output because it is classified according to where on the board the next move is taken, but there are cases where it can be passed, so there are 65 ways in total. The board position is indicated by 0 to 63, and Pass is 64.

The rest is the design of the hidden layer. For the time being, the number of neurons in both layers is 100. How can I theoretically design here ... The figure is as follows.

Implementation

Here is a python program that does the following:

Read and convert teacher data
MLP model creation, storage and training with Chainer
Simple test

`build_mlp.py`


#!/usr/bin/env python
# coding=utf-8
# 
# 0 : none
# 1 : black
# 2 : white
# 

import sys
import numpy as np
import chainer
from chainer import cuda, Function, gradient_check, report, training, utils, Variable
from chainer import datasets, iterators, optimizers, serializers
from chainer import Link, Chain, ChainList
import chainer.functions as F
import chainer.links as L
from chainer.training import extensions

gVec = [(-1,-1),(-1,0),(-1,1),(0,-1),(0,1),(1,-1),(1,0),(1,1)]
gCol = ('A','B','C','D','E','F','G','H')
gRow = ('1','2','3','4','5','6','7','8')

# (1)
class MLP(Chain):
    def __init__(self):
        super(MLP, self).__init__(
                l1=L.Linear(64, 100),
                l2=L.Linear(100, 100),
                l3=L.Linear(100, 65),
        )

    def __call__(self, x):
        h1 = F.relu(self.l1(x))
        h2 = F.relu(self.l2(h1))
        y = self.l3(h2)
        return y

class Classifier(Chain):
    def __init__(self, predictor):
        super(Classifier, self).__init__(predictor=predictor)

    def __call__(self, x, t):
        y = self.predictor(x)
        loss = F.softmax_cross_entropy(y, t)
        accuracy = F.accuracy(y, t)
        report({'loss': loss, 'accuracy': accuracy}, self)
        return loss


def print_board(board):
    for i in range(8):
        print board[i]

    print ""

def update_board(board, pos_str, clr):
    assert clr!=0, "stone color is not black or white."
    updated_board = [[0 for col in range(8)] for row in range(8)]
    rev_list = []
    pos = pos_str2pos_index(pos_str)
    for v in gVec:
        temp_list = []
        for i in range(1, 8):
            # out of board
            if pos[0]+v[0]*(i+1) > 7 or pos[1]+v[1]*(i+1) > 7 or\
               pos[0]+v[0]*(i+1) < 0 or pos[1]+v[1]*(i+1) < 0:
                continue

            if board[pos[0]+v[0]*i][pos[1]+v[1]*i] == (clr % 2 + 1):
                temp_list.append([pos[0]+v[0]*i, pos[1]+v[1]*i])

                if board[pos[0]+v[0]*(i+1)][pos[1]+v[1]*(i+1)] == clr:
                    for j in temp_list:
                        rev_list.append(j)

                    break
            else:
                break

    rev_list.append(pos)  # put stone at pos
    assert board[pos[0]][pos[1]] == 0, "put position is not empty."
    print "rev_list = " + str(rev_list)
    for i in range(0, 8):
        for j in range(0, 8):
            if [i, j] in rev_list:
                updated_board[i][j] = clr
            else:
                updated_board[i][j] = board[i][j]

    return updated_board

def who_is_winner(board):
    # ret : 0  draw
    #       1  black win
    #       2  white win
    ret = 0
    score_b = 0
    score_w = 0
    for i in range(0, 8):
        for j in range(0, 8):
            if board[i][j] == 1:
                score_b += 1
            elif board[i][j] == 2:
                score_w += 1

    if score_b > score_w:
        ret = 1
    elif score_b < score_w:
        ret = 2

    print "Black vs White : " + str(score_b) + " vs " + str(score_w)
    return ret

def pos_str2pos_index(pos_str):
    pos_index = []
    for i, c in enumerate(gRow):
        if pos_str[1] == c:
            pos_index.append(i)
    
    for i, c in enumerate(gCol):
        if pos_str[0] == c:
            pos_index.append(i)


    return pos_index

def pos_str2pos_index_flat(pos_str):
    pos_index = pos_str2pos_index(pos_str)
    index = pos_index[0] * 8 + pos_index[1]
    return index


#==== Main ====#
record_X = []    # MLP input (board list)
record_y = []    # MLP output(class(0-64) list)
temp_X = []
temp_y = []
temp2_X = []
temp2_y = []
board = []
row = []

argv = sys.argv
argc = len(argv)

if argc != 3:
    print 'Usage'
    print '    python ' + str(argv[0]) + ' <record_filename> <type>'
    print '        type : black'
    print '               black_win'
    print '               white'
    print '               white_win'
    quit()

# check type
build_type = ''
for t in ['black', 'black_win', 'white', 'white_win']:
    if argv[2] == t:
        build_type = t

if build_type == '':
    print 'record type is illegal.'
    quit()

#(2)-- load record --#
f = open(argv[1], "r")
line_cnt = 1
for line in f:
    print 'Line Count = ' + str(line_cnt)
    idx = line.find("BO[8")
    if idx == -1:
        continue

    idx += 5
    # make board initial state
    for i in range(idx, idx+9*8):
        if line[i] == '-':
            row.append(0)
        elif line[i] == 'O':
            row.append(2)
        elif line[i] == '*':
            row.append(1)

        if (i-idx)%9 == 8:
            board.append(row)
            row = []
            if len(board) == 8:
                break

    row = []
    print_board(board)
    # record progress of game
    i = idx+9*8+2
    while line[i] != ';':
        if (line[i] == 'B' or line[i] == 'W') and line[i+1] == '[':
            temp_X.append(board)
            pos_str = line[i+2] + line[i+3]
            if pos_str == "pa":    # pass
                temp_y.append(64)
                # board state is not change
                print_board(board)
            else:
                if line[i] == 'B':
                    clr = 1
                elif line[i] == 'W':
                    clr = 2
                else:
                    clr = 0
                    assert False, "Stone Color is illegal."

                pos_index_flat = pos_str2pos_index_flat(pos_str)
                temp_y.append(pos_index_flat)
                board = update_board(board, pos_str, clr)

            if (line[i] == 'B' and (build_type == 'black' or build_type == 'black_win')) or \
               (line[i] == 'W' and (build_type == 'white' or build_type == 'white_win')):
                temp2_X.append(temp_X[0])
                temp2_y.append(temp_y[0])
                print 'X = '
                print_board(temp_X[0])
                print 'y = ' + str(temp_y[0]) + ' (' + \
                               str(pos_str2pos_index(pos_str)) + ') ' + \
                               '(' + pos_str + ')'
                print ''

            temp_X = []
            temp_y = []
        
        i += 1

    print "End of game"
    print_board(board)

    winner = who_is_winner(board)
    if (winner == 1 and build_type == 'black_win') or \
       (winner == 2 and build_type == 'white_win') or \
       build_type == 'black' or build_type == 'white':
        record_X.extend(temp2_X)
        record_y.extend(temp2_y)

    board = []
    temp2_X = []
    temp2_y = []
    line_cnt += 1


#(3)-- MLP model and Training --#
X = np.array(record_X, dtype=np.float32)
y = np.array(record_y, dtype=np.int32)

train = datasets.TupleDataset(X, y)
train_iter = iterators.SerialIterator(train, batch_size=100)

model = Classifier(MLP())
optimizer = optimizers.SGD()
optimizer.setup(model)

updater = training.StandardUpdater(train_iter, optimizer)
trainer = training.Trainer(updater, (1000, 'epoch'), out='result')

trainer.extend(extensions.ProgressBar())
trainer.run()

#(4)-- save model --#
serializers.save_npz('reversi_model.npz', model)


#(5)-- prediction example --#
X1_ = [[[0,0,0,0,0,0,0,0],\
        [0,0,0,0,0,0,0,0],\
        [0,0,0,0,0,0,0,0],\
        [0,0,0,2,1,0,0,0],\
        [0,0,0,1,2,0,0,0],\
        [0,0,0,0,0,0,0,0],\
        [0,0,0,0,0,0,0,0],\
        [0,0,0,0,0,0,0,0]]]

X1 = np.array(X1_, dtype=np.float32)
y1 = F.softmax(model.predictor(X1))
print "X1 = "
print_board(X1[0])
print "y1 = " + str(y1.data.argmax(1)) + '\n' 

X2_ = [[[0,0,0,0,0,0,0,0],\
        [0,0,0,0,0,0,0,0],\
        [0,0,2,2,2,0,0,0],\
        [0,0,2,1,1,1,0,0],\
        [0,2,2,2,1,1,0,0],\
        [0,0,2,1,0,0,0,0],\
        [0,0,0,0,0,0,0,0],\
        [0,0,0,0,0,0,0,0]]]

X2 = np.array(X2_, dtype=np.float32)
y2 = F.softmax(model.predictor(X2))
print "X2 = "
print_board(X2[0])
print "y2 = " + str(y2.data.argmax(1)) + '\n'

I will explain briefly. Please read it in correspondence with (1) to (5) in the code. (1) Chainer MLP definition The MLP class defines the structure of the MLP. As mentioned above, it is defined as input = 8x8 = 64, output = 65, and 2 hidden layers (100 neurons). The activation function and layer structure are defined in \ __ call \ __. The Classifier class defines the classification by Softmax.

(2) Reading and converting game records It's a long loop, but it reads the game record and converts it to input (8x8) and output (0-64). What was hard about programming was the hardest here ...

(3) Training of MLP model Input the converted game record into the model for training. Both input and output can be set in Numpy array format with dataset.TupleDataset. At this time, pay attention to the type. The rest is like the source code

batch_size setting
Setting the optimization method (SGD this time)
Execution count (epoch) setting

to hold. Training is executed with trainer.run.

(4) Saving the model Save the trained model. NPZ format. It will be used when implementing it in the Othello game in the second part.

(5) Simple test Try to derive the next move correctly with the trained model.

An example of the execution result is as follows. We use'Othello.01e4.ggf' as teacher data, and create AI only with the winning data of the first move (black).

$ python build_mlp.py Othello.01e4.ggf black_win
...

X1 = 
[ 0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  0.  0.  2.  1.  0.  0.  0.]
[ 0.  0.  0.  1.  2.  0.  0.  0.]
[ 0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  0.  0.  0.  0.  0.  0.  0.]

y1 = [37]

X2 = 
[ 0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  0.  2.  2.  2.  0.  0.  0.]
[ 0.  0.  2.  1.  1.  1.  0.  0.]
[ 0.  2.  2.  2.  1.  1.  0.  0.]
[ 0.  0.  2.  1.  0.  0.  0.  0.]
[ 0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  0.  0.  0.  0.  0.  0.  0.]

y2 = [25]

I write not to use GPU for training. If you run it on a poor PC, it will take about an hour. Since the Progress Bar is displayed by the function of Chainer, please refer to whether it will be completed in a decent time if you do it in your own environment. Also, in the example, Othello.01e4.ggf is used, but since there were two data (lines 85 and 2699) that deviated from the expected format, I deleted them and executed it.

If you look at the prediction example displayed in the results, you can see that the AI hits:

There are only two ways, but it seems that a simple test has successfully derived the next move. In Part 2, we will actually implement this trained model as AI in the Othello game and check the operation of Othello. ** → Operation confirmed and conclusion was reached. **

reference

GGS data base / Othello: Othello game record data acquisition source
Can chainer process natural language? Man --Learning XOR
Chainer Reference Manual

Let's make Othello AI with Chainer-Part 1-