Introduction

This article is a sequel to "Creating Othello AI with Chainer-Part 1-". We hope you read the first part before reading this article.

Part 1
Conversion of teacher data
MLP design
Model training and storage
Part 2 (this article)
Implementation in Othello games
Check if it is playable (can you play the game without deviating from the rules)
If you find that it is not playable here, go back to creating the MLP model

So, using the trained model created in the first part, I implemented AI using the MLP model in the Othello game. For the Othello game app, use the one created in "Let's make Othello with wxPython". The source code and trained models of the Othello game app are posted on GitHub, so please get them from there.

Othello game app description

This is an app created with wxPython. It requires wxPython installation to work. Please see here for the installation method. The startup method is as follows.

$ python reversi.py

reversi_1.2.0.png I will briefly explain the screen. Specify the model for AI by MLP in "MLP model setting" on the far right. Specify the model for the first move (black) in "for black" and the model for the second move (white) in "for white". Similarly, specify the computer AI in "Computer AI setting" on the far right. Details of AI will be described later. A game record is displayed in the blank area in the center as the game progresses. Set the game type in "Game mode" at the bottom center and start the game with the "START" button.

Man vs Man
Human-to-human battle. Left click to place the stone. The first move is black.
Man vs Computer A
Human vs Computer A. Humans are the first player (black). Computer AI will be the one set as Computer A in "Computer AI setting".
Computer A vs Man
Computer A vs human. Computer A is the first move (black).
Computer A vs Computer B
Compete between Computers.

"SCORE" at the bottom center is the current number of stones on the play (black) and play (white). Enter the number of Loops in the text box at the bottom center and press the "Comp vs Comp Loop" button to play Computer A and Computer B a specified number of times in a row. (At this time, do not use the "START" button) At the end of the Loop, the number of wins for Computer A and Computer B will be displayed. DRAW is the number of draws.

This Othello app not only allows you to play regular matches, but also has the following features.

You can select Computer AI. (For more information about AI other than MLP, click here](http://qiita.com/kanlkan/items/cf902964b02179d73639))
- MLP

AI using a trained MLP model. This is the subject of verification.
- 1st Gain Max
An AI that sets an evaluation value on the board and places it in a place where the total evaluation value of the stones that can be taken by hand is high.
- Min Max 3
AI that reads ahead 3 times using the Min-Max method. But very weak. Maybe something is wrong.
- Random
AI that randomly places where it can be placed.

If AI using the MLP model is selected and the AI selects a move that deviates from the rules, "* Illegal move! ... *" is output to the standard error output. The types are as follows. ** If an Illegal move! Occurs, Computer's hand should be placed in the first place found in the stone, if it can be placed, and if a pass must be made, pass and continue the game. Masu (Fail Safe). ** **
- Cannot put stone but AI cannot select 'PASS'.

I couldn't select PASS even though I didn't have a place to put stones.
- Cannot 'PASS' this turn but AI selected it.
I chose PASS even though I couldn't pass (there was a place to put it).
- Cannot put stone at AI selected postion.
You have selected a place where you cannot place stones.

You can count the wins and losses by playing multiple battles between Computer AIs in a row.

The first and second attacks of each AI are randomly determined.
Save the game record of the battle between AIs as "* record.log *".
This game record can be read by "* build_mlp.py *".

Verify AI using the MLP model

Now let's move on to verifying whether it is playable. I want to make a strong AI if possible. Therefore, the model (* model_black_win.npz ) trained using only the game record of the first (black) winning game and the model ( model_white_win.npz *) trained using only the game record of the second (white) winning game. Let's play against a human (me) using *). Hmmmm. .. .. ** Mumu **. ** Illegal move! It came out ... ** It wasn't as easy as Tic-tac-toe. However, it is still within expectations. First, let's examine the types of Illegal moves that appear and their frequency. The sun goes down when you play against yourself, so let's play against each other. I will play against MLP and Random. Illegal move! Occurred 7027 times in 1000 battles. The breakdown of the types of Illegal move that occurred is as follows.

Detail	Number of occurrences
Cannot put stone but AI cannot select 'PASS'.	93
Cannot 'PASS' this turn but AI selected it.	51
Cannot put stone at AI selected postion.	6883
total	7027

Roughly calculate. Assuming that no pass is generated in one game and the board cannot be settled until the board is filled, 60 moves will be taken until the board is filled and settled. One player hits 30 moves. 30,000 moves in 1000 battles. So, since it is 7027/30000, it can be estimated that 23.4% of the cases are fraudulent.

80% of the games are played according to the rules, so I'm worried about changing the MLP configuration ... First of all, I will do what I can think of without changing the configuration of MLP.

The try & error festival has started.

Retry 1: Try using the game records of all games, not just the game records of the winning games

To create a strong AI, I will set it aside and give priority to observing the rules. We will try to increase the variety of patterns by learning not only the games that won there but also the games that lost. Set "* model_black.npz " for the first move (black) and " model_white.npz *" for the second move (white) as the AI of MLP. These models are models created with the following commands.

$ python build_mlp.py Othello.01e4.ggf black
$ mv reversi_model.npz model_black.npz
$ python build_mlp.py Othello.01e4.ggf white
$ mv reversi_model.npz model_white.npz

Now, let's play against each other as before. MLP vs Random rematch. Illegal move! Occurred 5720 times in 1000 battles. The breakdown is as follows.

Detail	Number of occurrences
Cannot put stone but AI cannot select 'PASS'.	65
Cannot 'PASS' this turn but AI selected it.	123
Cannot put stone at AI selected postion.	5532
total	5720

Since it is 5720/30000, 19.1% of them are doing something wrong. It seems that the future is still long, but it is less than before. It's a good trend.

Retry 2: Try increasing the training Epoch count from 1000 to 3000

It's pretty defeated, but the method is that if you train more often, it will work. I modified "* build_mlp.py *" so that batch_size and max epoch count can be specified as arguments, and recreated the model with batch_size = 100, max_epoch_count = 3000.

$ python build_mlp.py Othello.01e4.ggf black 100 3000
$ mv reversi_model.npz model_epoch-3000_black.npz
$ python build_mlp.py Othello.01e4.ggf white 100 3000
$ mv reversi_model.npz model_epoch-3000_white.npz

By the way, in my environment, this learning took 5 hours + 5 hours = 10 hours ... I'm afraid I only have a Linux environment on VirtualBox ...

Compete between Computers with these models. MLP vs Random 1000 battles.

Detail	Number of occurrences
Cannot put stone but AI cannot select 'PASS'.	35
Cannot 'PASS' this turn but AI selected it.	257
Cannot put stone at AI selected postion.	5677
total	5966

Since it is 5966/30000, 19.9% of the time it is fraudulent. It doesn't change much. Even if you increase the Epoch to 1000 or more, it seems that learning will not proceed any further. Actually, even if you do not try with Epoch set to 3000, prepare about 1000 test data similar to the teacher data separately and take the Epoch-Accuracy correlation (learning curve) when executing "* build_mlp.py *". If you look at it, you can see how much Epoch is unlikely to advance learning any further.

Retry 3: Perform MLP vs Random 10000 times and use the model trained on the game record

Even if the teacher data is devised, it does not go well, so I will try a different approach. Set up MLP AI using "* model_black.npz " and " model_white.npz " and play against each other in MLP vs Random (10000 battles this time). I will try to train again with that game record. The intention is that by using a game record in which the correct answer is given by the Fail Safe function for the pattern that MLP AI is not good at, I think that it is possible to learn about the pattern that MLP AI is not good at. Because it is. First, let's play 10000 MLP vs Random. A game record file called " record.log " will be saved, so rename it to " mlp_vs_random_10000_01.log *". Read this log file and recreate the model.

$ python build_mlp.py mlp_vs_random_10000_01.log black 100 1000
$ mv reversi_model.npz model_mlp_vs_random_black_01.npz
$ python build_mlp.py mlp_vs_random_10000_01.log white 100 1000
$ mv reversi_model.npz model_mlp_vs_random_white_01.npz

Rematch with MLP vs Random using the new model (* model_mlp_vs_random_black_01.npz, model_mlp_vs_random_white_01.npz *). Illegal move! Occurred 2794 times in 1000 matches. The breakdown is as follows.

Detail	Number of occurrences
Cannot put stone but AI cannot select 'PASS'.	84
Cannot 'PASS' this turn but AI selected it.	57
Cannot put stone at AI selected postion.	2653
total	2794

Since it is 2794/30000, it is illegal at a rate of 9.3%. Sounds good! It's about half! Now, let's do our best from here.

This method seems to work, so try the same steps again. Use * model_mlp_vs_random_black_01.npz, model_mlp_vs_random_white_01.npz * to play 10000 times in MLP vs Random, change "* record.log " to " mlp_vs_random_10000_02.log *" and recreate the model with this log. I will.

$ python build_mlp.py mlp_vs_random_10000_02.log black 100 1000
$ mv reversi_model.npz model_mlp_vs_random_black_02.npz
$ python build_mlp.py mlp_vs_random_10000_02.log white 100 1000
$ mv reversi_model.npz model_mlp_vs_random_white_02.npz

Rematch with MLP vs Random using the new model (* model_mlp_vs_random_black_02.npz, model_mlp_vs_random_white_02.npz *). Illegal move! Occurred 2561 times in 1000 battles. The breakdown is as follows.

Detail	Number of occurrences
Cannot put stone but AI cannot select 'PASS'.	121
Cannot 'PASS' this turn but AI selected it.	41
Cannot put stone at AI selected postion.	2399
total	2561

Since it is 2399/30000, 8.0% of the cases are fraudulent. Hmmm ... should it be seen as a slight drop? Once, this method is stopped here.

Retry 4: Perform MLP vs Random 10000 times, match the game record with Othell0.01e4.ggf, and use the trained model.

$cat Othello.01e4 mlp_random_10000_01.log > 01e4_mlp_randomA_10000_01.ggf
$ python build_mlp.py 01e4_mlp_randomA_10000_01.ggf
 black 100 1000
$ mv reversi_model.npz model_01e4_mlp_vs_random_black_02.npz
$ python build_mlp.py 01e4_mlp_randomA_10000_01.ggf
 white 100 1000
$ mv reversi_model.npz model_01e4_mlp_vs_random_white_02.npz

Let's train with the existing game record "* Othello.01e4.ggf " and the game record created in the previous stage " mlp_random_10000_01.log *". I thought that the performance would be improved because the game record would be more diverse and include a hand to become an Illegal move.

Rematch with MLP vs Random using the new model (* model_01e4_mlp_vs_random_black.npz, model_01e4_mlp_vs_random_white.npz *). Illegal move! Occurred 3325 times in 1000 battles. The breakdown is as follows.

Detail	Number of occurrences
Cannot put stone but AI cannot select 'PASS'.	48
Cannot 'PASS' this turn but AI selected it.	90
Cannot put stone at AI selected postion.	3187
total	3325

Since it is 3325/30000, we are doing fraudulent actions at a rate of 11.1%. Well, the result is not much different from Retry 3. Sorry.

Interim summary

This week I will summarize it here.

Simply reading a game record on an early model does not make it a playable AI
AI that reads only the winning game records of black and white is more likely to make a mistake.
MLP's AI should be equipped with a function that can at least modify the hand that can continue the game as a fail safe if the wrong hand is selected.
You can reduce the possibility of making a wrong move by setting Fail Safe in AI in MLP, acquiring a large number of game records with MLP vs Random, and training again with that game record.

Is it such a place? At present, the probability of selecting an illegal move has been reduced to 8.0%, so AI with Fail Safe will be the result at the midpoint. Well, this may be the final result, but I'm a little disappointed, so from next week onward, I'll be playing around with the MLP model settings. I will add more to this article about future tries. If you are interested in the development, please watch it.

In this study, in my environment, it is a big bottleneck that it takes several hours to train the model from the game record every time ... If you can use GPU, I think that it can be done a little easier, so it would be interesting to rewrite "* build_mlp.py *" to support GPU and try various things.

↓↓↓↓↓ 2016/08/14 update ↓↓↓↓↓

Check the learning curve

Updated "* bulb_mlp.py *" to get a learning curve. The end of the data 1000 samples are reserved as test (Validation) data (this 1000 samples are not used for training), and the correct answer rate (main / accuracy) in the training data at each Epoch and the test data unknown to the model The correct answer rate (validation / main / accuracy) is displayed. Now you can draw a learning curve. Well, it's a story that you should finally do it from the beginning ...

First, the learning curve in the early model is shown.

At 1000 Epoch, main / accuracy is saturated at around 0.45. It can be judged that trying to change the Epoch performed in retry 2 from 1000 to 3000 seems to be ineffective. By the way, isn't the correct answer rate of 0.45 = 45% low? You may think that, because you are using an actual game record, there are multiple moves to answer a certain board condition, so it seems that the correct answer rate is not so high.

Next, the learning curve when the number of neurons in the hidden layer (h1, h2) is increased from 100 to 200 is shown.

It is the same that it almost converges at about 1000 Epoch. Increasing the number of neurons increases the correct answer rate (main / accuracy) of the training data, but the value of validation / main / accuracy, which is the correct answer rate for unknown input, is almost the same as when the number of neurons is 100. That means ... I don't think I can comply with the rules even if I change from 100 to 200 neurons ... But for the time being, I will check it as retry 5.

Retry 5: Increase h1 and h2 neurons from 100 to 200

I thought that increasing the number of neurons would increase the training effect on the game record and make it possible to comply with the rules. Change the definition of MLP Class in "* build_mlp.py *".

...
class MLP(Chain):
    def __init__(self):
        super(MLP, self).__init__(
                l1=L.Linear(64,200)
                l2=L.Linear(200,200)
                l3=L.Linear(200,65)
        )
...

After making the changes, create a trained model.

$ python build_mlp.py Othelo.01e4.ggf black 100 1000
$ mv reversi_model.npz model_neuron-200_black.npz
$ python build_mlp.py Othelo.01e4.ggf white 100 1000
$ mv reversi_model.npz model_neuron-200_white.npz

Rematch with MLP vs Random using the new model (* model_neuron-200_black.npz, model_neuron-200_white.npz ). (The definition of MLP class in " reversi.py *" also needs to be changed as described above) Illegal move! Occurred 10778 times in 1000 battles. The breakdown is as follows.

Detail	Number of occurrences
Cannot put stone but AI cannot select 'PASS'.	99
Cannot 'PASS' this turn but AI selected it.	120
Cannot put stone at AI selected postion.	10778
total	10997

Since it is 10997/30000, we are doing fraudulent actions at a rate of 36.7%. It's getting worse ... it's probably too fit for your training data.

Retry 6: Increase training sample by about 3 times

Connect not only Othello.01e4.ggf but also Othello.02e4.ggf and Othello.03e4.ggf to read the game record, and try to learn more various patterns by increasing the amount by about 3 times.

$ cat Othello.01e4.ggf Othello.02e4.ggf Othello.03e4.ggf > Othello.01-03e4.ggf
$ python build_mlp.py Ohtello.01-03e4.ggf black 100 1000
$ mv reversi_model.npz model_01-03e4_black.npz 
$ python build_mlp.py Ohtello.01-03e4.ggf white 100 1000
$ mv reversi_model.npz model_01-03e4_white.npz

The learning curve is as follows. The validation / main / accuracy has improved from 0.3 to 0.35 compared to the case where the game record is only Othello.01e4.ggf. Can you expect a little?

Rematch with MLP vs Random using the new model (* model_01-03e4_black.npz, model_01-03e4_white.npz *). Illegal move! Occurred 5284 times in 1000 matches. The breakdown is as follows.

Detail	Number of occurrences
Cannot put stone but AI cannot select 'PASS'.	40
Cannot 'PASS' this turn but AI selected it.	228
Cannot put stone at AI selected postion.	5016
total	5284

Since it is 5284/30000, it is illegal at a rate of 17.6%. It's not much different from the case where the game record uses only Othello.01e4.ggf (19.1%). It's getting pretty messy ...

Retry 7: Add "place" information to the MLP input (board condition)

This time, I will change my mind and try to modify the MLP input itself. Specifically, in addition to '0': none, '1': black, and '2': white, the board information given as input is added as a place where '3': can be placed. For example. X1 is the state X0 in the case of the black turn, with a place to put it added.

X0 = [[0,0,0,0,0,0,0,0],\
      [0,0,0,0,0,0,0,0],\
      [0,0,0,0,0,0,0,0],\
      [0,0,0,2,1,0,0,0],\
      [0,0,1,2,2,2,2,0],\
      [0,0,0,1,2,2,0,0],\
      [0,0,0,0,1,2,0,0],\
      [0,0,0,0,0,0,0,0]]

X1 = [[0,0,0,0,0,0,0,0],\
      [0,0,0,0,0,0,0,0],\
      [0,0,0,3,3,0,0,0],\
      [0,0,3,2,1,3,0,3],\
      [0,0,1,2,2,2,2,3],\
      [0,0,3,1,2,2,3,0],\
      [0,0,0,0,1,2,3,0],\
      [0,0,0,0,0,0,0,0]]

By doing this, I thought that AI that complies with the rules could be created. There seems to be a tsukkomi saying "I'm almost telling you the rules!", But the premise that the state of the board is used as input and the next move is obtained as output has not changed. (A pretty painful excuse ...) Changed "* build_mlp.py *" to specify whether to add a place to put it. If True is added at the end of the command, the place where it can be placed will be added to the input board as '3'.

$ pythohn build_mlp.py Othello.01e4.ggf black 100 1000 True
$ mv reversi_model.npz model_black_puttable_mark.npz 
$ pythohn build_mlp.py Othello.01e4.ggf white 100 1000 True
$ mv reversi_model.npz model_white_puttable_mark.npz

The learning curve is as follows.

It has the highest validation / main / accuracy so far and converges at around 0.4.

Rematch with MLP vs Random using the new model (* model_black_puttable_mark.npz, model_white_puttable_mark.npz ). Fixed " reversi.py " to support these models. If the specified MLP model name includes " puttable_mark *", it corresponds to the input represented by "3" where it can be placed.

Illegal move! Occurred 207 times in 1000 battles. The breakdown is as follows.

Detail	Number of occurrences
Cannot put stone but AI cannot select 'PASS'.	16
Cannot 'PASS' this turn but AI selected it.	17
Cannot put stone at AI selected postion.	174
total	207

Since it is 207/30000, we are doing something illegal at a rate of 0.7%. What a 1% or less! It is a dramatic improvement.

Final summary

I will summarize the above.

Simply reading a game record on an early model does not make it a playable AI
AI that reads only the winning game records of black and white is more likely to make a mistake.
MLP's AI should be equipped with a function that can at least modify the hand that can continue the game as a fail safe if the wrong hand is selected.
You can reduce the possibility of making a wrong move by setting Fail Safe in AI in MLP, acquiring a large number of game records with MLP vs Random, and training again with that game record.
Increasing the number of neurons in MLP cannot reduce the probability of making a wrong move (rather worse)
Increasing MLP training samples does not reduce the chance of making a mistake
Adding "place to put" as information to the state of the board, which is the input, can dramatically improve the possibility of making a wrong move.

As an AI that works properly, is it the one with "3" added to the state of the board with "Fail Safe" added? It may be a little sloppy, but I was able to make a decent AI.

At present, the strength of AI depends only on the game record. After creating the basic AI by the above method, it will be strengthened by using reinforcement learning etc. I'm totally ignorant about reinforcement learning, so I'll try after studying.

Thank you for your consideration. It was poor.

Let's make Othello AI with Chainer-Part 2-