Previously, I wrote an article "Let's make a tic-tac-toe AI with Pylearn 2". At this time, I made the problem simple and tic-tac-toe, but this time I will challenge the original goal of creating Othello's AI with a neural network. Please forgive it though it may be an article with many trials and errors. This time, I will use Chainer as a library for deep learning.
The article consists of two parts.
We will implement AI in the second part, but if it does not work, we will have to reconsider from the configuration of the neural network ... ** → Operation confirmed and conclusion **
I borrowed the game record data from here. A large amount of battle data stored in ggf format. It is readable in text format.
First, decide the input and output. Since Othello's AI will be created, the input will be the state of the board and the next move will be the output. It is the following image when written in the figure.
Specifically, I will drop it into a neural network. The input is the state of the board, that is, an 8 x 8 vector. The elements of the vector are '0': none, '1': black, and '2': white. There are 64 types of output because it is classified according to where on the board the next move is taken, but there are cases where it can be passed, so there are 65 ways in total. The board position is indicated by 0 to 63, and Pass is 64.
The rest is the design of the hidden layer. For the time being, the number of neurons in both layers is 100. How can I theoretically design here ... The figure is as follows.
Here is a python program that does the following:
build_mlp.py
#!/usr/bin/env python
# coding=utf-8
#
# 0 : none
# 1 : black
# 2 : white
#
import sys
import numpy as np
import chainer
from chainer import cuda, Function, gradient_check, report, training, utils, Variable
from chainer import datasets, iterators, optimizers, serializers
from chainer import Link, Chain, ChainList
import chainer.functions as F
import chainer.links as L
from chainer.training import extensions
gVec = [(-1,-1),(-1,0),(-1,1),(0,-1),(0,1),(1,-1),(1,0),(1,1)]
gCol = ('A','B','C','D','E','F','G','H')
gRow = ('1','2','3','4','5','6','7','8')
# (1)
class MLP(Chain):
def __init__(self):
super(MLP, self).__init__(
l1=L.Linear(64, 100),
l2=L.Linear(100, 100),
l3=L.Linear(100, 65),
)
def __call__(self, x):
h1 = F.relu(self.l1(x))
h2 = F.relu(self.l2(h1))
y = self.l3(h2)
return y
class Classifier(Chain):
def __init__(self, predictor):
super(Classifier, self).__init__(predictor=predictor)
def __call__(self, x, t):
y = self.predictor(x)
loss = F.softmax_cross_entropy(y, t)
accuracy = F.accuracy(y, t)
report({'loss': loss, 'accuracy': accuracy}, self)
return loss
def print_board(board):
for i in range(8):
print board[i]
print ""
def update_board(board, pos_str, clr):
assert clr!=0, "stone color is not black or white."
updated_board = [[0 for col in range(8)] for row in range(8)]
rev_list = []
pos = pos_str2pos_index(pos_str)
for v in gVec:
temp_list = []
for i in range(1, 8):
# out of board
if pos[0]+v[0]*(i+1) > 7 or pos[1]+v[1]*(i+1) > 7 or\
pos[0]+v[0]*(i+1) < 0 or pos[1]+v[1]*(i+1) < 0:
continue
if board[pos[0]+v[0]*i][pos[1]+v[1]*i] == (clr % 2 + 1):
temp_list.append([pos[0]+v[0]*i, pos[1]+v[1]*i])
if board[pos[0]+v[0]*(i+1)][pos[1]+v[1]*(i+1)] == clr:
for j in temp_list:
rev_list.append(j)
break
else:
break
rev_list.append(pos) # put stone at pos
assert board[pos[0]][pos[1]] == 0, "put position is not empty."
print "rev_list = " + str(rev_list)
for i in range(0, 8):
for j in range(0, 8):
if [i, j] in rev_list:
updated_board[i][j] = clr
else:
updated_board[i][j] = board[i][j]
return updated_board
def who_is_winner(board):
# ret : 0 draw
# 1 black win
# 2 white win
ret = 0
score_b = 0
score_w = 0
for i in range(0, 8):
for j in range(0, 8):
if board[i][j] == 1:
score_b += 1
elif board[i][j] == 2:
score_w += 1
if score_b > score_w:
ret = 1
elif score_b < score_w:
ret = 2
print "Black vs White : " + str(score_b) + " vs " + str(score_w)
return ret
def pos_str2pos_index(pos_str):
pos_index = []
for i, c in enumerate(gRow):
if pos_str[1] == c:
pos_index.append(i)
for i, c in enumerate(gCol):
if pos_str[0] == c:
pos_index.append(i)
return pos_index
def pos_str2pos_index_flat(pos_str):
pos_index = pos_str2pos_index(pos_str)
index = pos_index[0] * 8 + pos_index[1]
return index
#==== Main ====#
record_X = [] # MLP input (board list)
record_y = [] # MLP output(class(0-64) list)
temp_X = []
temp_y = []
temp2_X = []
temp2_y = []
board = []
row = []
argv = sys.argv
argc = len(argv)
if argc != 3:
print 'Usage'
print ' python ' + str(argv[0]) + ' <record_filename> <type>'
print ' type : black'
print ' black_win'
print ' white'
print ' white_win'
quit()
# check type
build_type = ''
for t in ['black', 'black_win', 'white', 'white_win']:
if argv[2] == t:
build_type = t
if build_type == '':
print 'record type is illegal.'
quit()
#(2)-- load record --#
f = open(argv[1], "r")
line_cnt = 1
for line in f:
print 'Line Count = ' + str(line_cnt)
idx = line.find("BO[8")
if idx == -1:
continue
idx += 5
# make board initial state
for i in range(idx, idx+9*8):
if line[i] == '-':
row.append(0)
elif line[i] == 'O':
row.append(2)
elif line[i] == '*':
row.append(1)
if (i-idx)%9 == 8:
board.append(row)
row = []
if len(board) == 8:
break
row = []
print_board(board)
# record progress of game
i = idx+9*8+2
while line[i] != ';':
if (line[i] == 'B' or line[i] == 'W') and line[i+1] == '[':
temp_X.append(board)
pos_str = line[i+2] + line[i+3]
if pos_str == "pa": # pass
temp_y.append(64)
# board state is not change
print_board(board)
else:
if line[i] == 'B':
clr = 1
elif line[i] == 'W':
clr = 2
else:
clr = 0
assert False, "Stone Color is illegal."
pos_index_flat = pos_str2pos_index_flat(pos_str)
temp_y.append(pos_index_flat)
board = update_board(board, pos_str, clr)
if (line[i] == 'B' and (build_type == 'black' or build_type == 'black_win')) or \
(line[i] == 'W' and (build_type == 'white' or build_type == 'white_win')):
temp2_X.append(temp_X[0])
temp2_y.append(temp_y[0])
print 'X = '
print_board(temp_X[0])
print 'y = ' + str(temp_y[0]) + ' (' + \
str(pos_str2pos_index(pos_str)) + ') ' + \
'(' + pos_str + ')'
print ''
temp_X = []
temp_y = []
i += 1
print "End of game"
print_board(board)
winner = who_is_winner(board)
if (winner == 1 and build_type == 'black_win') or \
(winner == 2 and build_type == 'white_win') or \
build_type == 'black' or build_type == 'white':
record_X.extend(temp2_X)
record_y.extend(temp2_y)
board = []
temp2_X = []
temp2_y = []
line_cnt += 1
#(3)-- MLP model and Training --#
X = np.array(record_X, dtype=np.float32)
y = np.array(record_y, dtype=np.int32)
train = datasets.TupleDataset(X, y)
train_iter = iterators.SerialIterator(train, batch_size=100)
model = Classifier(MLP())
optimizer = optimizers.SGD()
optimizer.setup(model)
updater = training.StandardUpdater(train_iter, optimizer)
trainer = training.Trainer(updater, (1000, 'epoch'), out='result')
trainer.extend(extensions.ProgressBar())
trainer.run()
#(4)-- save model --#
serializers.save_npz('reversi_model.npz', model)
#(5)-- prediction example --#
X1_ = [[[0,0,0,0,0,0,0,0],\
[0,0,0,0,0,0,0,0],\
[0,0,0,0,0,0,0,0],\
[0,0,0,2,1,0,0,0],\
[0,0,0,1,2,0,0,0],\
[0,0,0,0,0,0,0,0],\
[0,0,0,0,0,0,0,0],\
[0,0,0,0,0,0,0,0]]]
X1 = np.array(X1_, dtype=np.float32)
y1 = F.softmax(model.predictor(X1))
print "X1 = "
print_board(X1[0])
print "y1 = " + str(y1.data.argmax(1)) + '\n'
X2_ = [[[0,0,0,0,0,0,0,0],\
[0,0,0,0,0,0,0,0],\
[0,0,2,2,2,0,0,0],\
[0,0,2,1,1,1,0,0],\
[0,2,2,2,1,1,0,0],\
[0,0,2,1,0,0,0,0],\
[0,0,0,0,0,0,0,0],\
[0,0,0,0,0,0,0,0]]]
X2 = np.array(X2_, dtype=np.float32)
y2 = F.softmax(model.predictor(X2))
print "X2 = "
print_board(X2[0])
print "y2 = " + str(y2.data.argmax(1)) + '\n'
I will explain briefly. Please read it in correspondence with (1) to (5) in the code. (1) Chainer MLP definition The MLP class defines the structure of the MLP. As mentioned above, it is defined as input = 8x8 = 64, output = 65, and 2 hidden layers (100 neurons). The activation function and layer structure are defined in \ __ call \ __. The Classifier class defines the classification by Softmax.
(2) Reading and converting game records It's a long loop, but it reads the game record and converts it to input (8x8) and output (0-64). What was hard about programming was the hardest here ...
(3) Training of MLP model Input the converted game record into the model for training. Both input and output can be set in Numpy array format with dataset.TupleDataset. At this time, pay attention to the type. The rest is like the source code
to hold. Training is executed with trainer.run.
(4) Saving the model Save the trained model. NPZ format. It will be used when implementing it in the Othello game in the second part.
(5) Simple test Try to derive the next move correctly with the trained model.
An example of the execution result is as follows. We use'Othello.01e4.ggf' as teacher data, and create AI only with the winning data of the first move (black).
$ python build_mlp.py Othello.01e4.ggf black_win
...
X1 =
[ 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 2. 1. 0. 0. 0.]
[ 0. 0. 0. 1. 2. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0.]
y1 = [37]
X2 =
[ 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 2. 2. 2. 0. 0. 0.]
[ 0. 0. 2. 1. 1. 1. 0. 0.]
[ 0. 2. 2. 2. 1. 1. 0. 0.]
[ 0. 0. 2. 1. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0.]
y2 = [25]
I write not to use GPU for training. If you run it on a poor PC, it will take about an hour. Since the Progress Bar is displayed by the function of Chainer, please refer to whether it will be completed in a decent time if you do it in your own environment. Also, in the example, Othello.01e4.ggf is used, but since there were two data (lines 85 and 2699) that deviated from the expected format, I deleted them and executed it.
If you look at the prediction example displayed in the results, you can see that the AI hits:
There are only two ways, but it seems that a simple test has successfully derived the next move. In Part 2, we will actually implement this trained model as AI in the Othello game and check the operation of Othello. ** → Operation confirmed and conclusion was reached. **
Recommended Posts