This article is a sequel to "Creating Othello AI with Chainer-Part 1-". We hope you read the first part before reading this article.
Conversion of teacher data
MLP design
Model training and storage
Part 2 (this article)
Implementation in Othello games
Check if it is playable (can you play the game without deviating from the rules)
If you find that it is not playable here, go back to creating the MLP model
So, using the trained model created in the first part, I implemented AI using the MLP model in the Othello game. For the Othello game app, use the one created in "Let's make Othello with wxPython". The source code and trained models of the Othello game app are posted on GitHub, so please get them from there.
This is an app created with wxPython. It requires wxPython installation to work. Please see here for the installation method. The startup method is as follows.
$ python reversi.py
I will briefly explain the screen. Specify the model for AI by MLP in "MLP model setting" on the far right. Specify the model for the first move (black) in "for black" and the model for the second move (white) in "for white". Similarly, specify the computer AI in "Computer AI setting" on the far right. Details of AI will be described later. A game record is displayed in the blank area in the center as the game progresses. Set the game type in "Game mode" at the bottom center and start the game with the "START" button.
"SCORE" at the bottom center is the current number of stones on the play (black) and play (white). Enter the number of Loops in the text box at the bottom center and press the "Comp vs Comp Loop" button to play Computer A and Computer B a specified number of times in a row. (At this time, do not use the "START" button) At the end of the Loop, the number of wins for Computer A and Computer B will be displayed. DRAW is the number of draws.
This Othello app not only allows you to play regular matches, but also has the following features.
Now let's move on to verifying whether it is playable. I want to make a strong AI if possible. Therefore, the model (* model_black_win.npz ) trained using only the game record of the first (black) winning game and the model ( model_white_win.npz *) trained using only the game record of the second (white) winning game. Let's play against a human (me) using *). Hmmmm. .. .. ** Mumu **. ** Illegal move! It came out ... ** It wasn't as easy as Tic-tac-toe. However, it is still within expectations. First, let's examine the types of Illegal moves that appear and their frequency. The sun goes down when you play against yourself, so let's play against each other. I will play against MLP and Random. Illegal move! Occurred 7027 times in 1000 battles. The breakdown of the types of Illegal move that occurred is as follows.
Detail | Number of occurrences |
---|---|
Cannot put stone but AI cannot select 'PASS'. | 93 |
Cannot 'PASS' this turn but AI selected it. | 51 |
Cannot put stone at AI selected postion. | 6883 |
total | 7027 |
Roughly calculate. Assuming that no pass is generated in one game and the board cannot be settled until the board is filled, 60 moves will be taken until the board is filled and settled. One player hits 30 moves. 30,000 moves in 1000 battles. So, since it is 7027/30000, it can be estimated that 23.4% of the cases are fraudulent.
80% of the games are played according to the rules, so I'm worried about changing the MLP configuration ... First of all, I will do what I can think of without changing the configuration of MLP.
The try & error festival has started.
To create a strong AI, I will set it aside and give priority to observing the rules. We will try to increase the variety of patterns by learning not only the games that won there but also the games that lost. Set "* model_black.npz " for the first move (black) and " model_white.npz *" for the second move (white) as the AI of MLP. These models are models created with the following commands.
$ python build_mlp.py Othello.01e4.ggf black
$ mv reversi_model.npz model_black.npz
$ python build_mlp.py Othello.01e4.ggf white
$ mv reversi_model.npz model_white.npz
Now, let's play against each other as before. MLP vs Random rematch. Illegal move! Occurred 5720 times in 1000 battles. The breakdown is as follows.
Detail | Number of occurrences |
---|---|
Cannot put stone but AI cannot select 'PASS'. | 65 |
Cannot 'PASS' this turn but AI selected it. | 123 |
Cannot put stone at AI selected postion. | 5532 |
total | 5720 |
Since it is 5720/30000, 19.1% of them are doing something wrong. It seems that the future is still long, but it is less than before. It's a good trend.
It's pretty defeated, but the method is that if you train more often, it will work. I modified "* build_mlp.py *" so that batch_size and max epoch count can be specified as arguments, and recreated the model with batch_size = 100, max_epoch_count = 3000.
$ python build_mlp.py Othello.01e4.ggf black 100 3000
$ mv reversi_model.npz model_epoch-3000_black.npz
$ python build_mlp.py Othello.01e4.ggf white 100 3000
$ mv reversi_model.npz model_epoch-3000_white.npz
By the way, in my environment, this learning took 5 hours + 5 hours = 10 hours ... I'm afraid I only have a Linux environment on VirtualBox ...
Compete between Computers with these models. MLP vs Random 1000 battles.
Detail | Number of occurrences |
---|---|
Cannot put stone but AI cannot select 'PASS'. | 35 |
Cannot 'PASS' this turn but AI selected it. | 257 |
Cannot put stone at AI selected postion. | 5677 |
total | 5966 |
Since it is 5966/30000, 19.9% of the time it is fraudulent. It doesn't change much. Even if you increase the Epoch to 1000 or more, it seems that learning will not proceed any further. Actually, even if you do not try with Epoch set to 3000, prepare about 1000 test data similar to the teacher data separately and take the Epoch-Accuracy correlation (learning curve) when executing "* build_mlp.py *". If you look at it, you can see how much Epoch is unlikely to advance learning any further.
Even if the teacher data is devised, it does not go well, so I will try a different approach. Set up MLP AI using "* model_black.npz " and " model_white.npz " and play against each other in MLP vs Random (10000 battles this time). I will try to train again with that game record. The intention is that by using a game record in which the correct answer is given by the Fail Safe function for the pattern that MLP AI is not good at, I think that it is possible to learn about the pattern that MLP AI is not good at. Because it is. First, let's play 10000 MLP vs Random. A game record file called " record.log " will be saved, so rename it to " mlp_vs_random_10000_01.log *". Read this log file and recreate the model.
$ python build_mlp.py mlp_vs_random_10000_01.log black 100 1000
$ mv reversi_model.npz model_mlp_vs_random_black_01.npz
$ python build_mlp.py mlp_vs_random_10000_01.log white 100 1000
$ mv reversi_model.npz model_mlp_vs_random_white_01.npz
Rematch with MLP vs Random using the new model (* model_mlp_vs_random_black_01.npz, model_mlp_vs_random_white_01.npz *). Illegal move! Occurred 2794 times in 1000 matches. The breakdown is as follows.
Detail | Number of occurrences |
---|---|
Cannot put stone but AI cannot select 'PASS'. | 84 |
Cannot 'PASS' this turn but AI selected it. | 57 |
Cannot put stone at AI selected postion. | 2653 |
total | 2794 |
Since it is 2794/30000, it is illegal at a rate of 9.3%. Sounds good! It's about half! Now, let's do our best from here.
This method seems to work, so try the same steps again. Use * model_mlp_vs_random_black_01.npz, model_mlp_vs_random_white_01.npz * to play 10000 times in MLP vs Random, change "* record.log " to " mlp_vs_random_10000_02.log *" and recreate the model with this log. I will.
$ python build_mlp.py mlp_vs_random_10000_02.log black 100 1000
$ mv reversi_model.npz model_mlp_vs_random_black_02.npz
$ python build_mlp.py mlp_vs_random_10000_02.log white 100 1000
$ mv reversi_model.npz model_mlp_vs_random_white_02.npz
Rematch with MLP vs Random using the new model (* model_mlp_vs_random_black_02.npz, model_mlp_vs_random_white_02.npz *). Illegal move! Occurred 2561 times in 1000 battles. The breakdown is as follows.
Detail | Number of occurrences |
---|---|
Cannot put stone but AI cannot select 'PASS'. | 121 |
Cannot 'PASS' this turn but AI selected it. | 41 |
Cannot put stone at AI selected postion. | 2399 |
total | 2561 |
Since it is 2399/30000, 8.0% of the cases are fraudulent. Hmmm ... should it be seen as a slight drop? Once, this method is stopped here.
$cat Othello.01e4 mlp_random_10000_01.log > 01e4_mlp_randomA_10000_01.ggf
$ python build_mlp.py 01e4_mlp_randomA_10000_01.ggf
black 100 1000
$ mv reversi_model.npz model_01e4_mlp_vs_random_black_02.npz
$ python build_mlp.py 01e4_mlp_randomA_10000_01.ggf
white 100 1000
$ mv reversi_model.npz model_01e4_mlp_vs_random_white_02.npz
Let's train with the existing game record "* Othello.01e4.ggf " and the game record created in the previous stage " mlp_random_10000_01.log *". I thought that the performance would be improved because the game record would be more diverse and include a hand to become an Illegal move.
Rematch with MLP vs Random using the new model (* model_01e4_mlp_vs_random_black.npz, model_01e4_mlp_vs_random_white.npz *). Illegal move! Occurred 3325 times in 1000 battles. The breakdown is as follows.
Detail | Number of occurrences |
---|---|
Cannot put stone but AI cannot select 'PASS'. | 48 |
Cannot 'PASS' this turn but AI selected it. | 90 |
Cannot put stone at AI selected postion. | 3187 |
total | 3325 |
Since it is 3325/30000, we are doing fraudulent actions at a rate of 11.1%. Well, the result is not much different from Retry 3. Sorry.
This week I will summarize it here.
Is it such a place? At present, the probability of selecting an illegal move has been reduced to 8.0%, so AI with Fail Safe will be the result at the midpoint. Well, this may be the final result, but I'm a little disappointed, so from next week onward, I'll be playing around with the MLP model settings. I will add more to this article about future tries. If you are interested in the development, please watch it.
In this study, in my environment, it is a big bottleneck that it takes several hours to train the model from the game record every time ... If you can use GPU, I think that it can be done a little easier, so it would be interesting to rewrite "* build_mlp.py *" to support GPU and try various things.
Updated "* bulb_mlp.py *" to get a learning curve. The end of the data 1000 samples are reserved as test (Validation) data (this 1000 samples are not used for training), and the correct answer rate (main / accuracy) in the training data at each Epoch and the test data unknown to the model The correct answer rate (validation / main / accuracy) is displayed. Now you can draw a learning curve. Well, it's a story that you should finally do it from the beginning ...
First, the learning curve in the early model is shown.
At 1000 Epoch, main / accuracy is saturated at around 0.45. It can be judged that trying to change the Epoch performed in retry 2 from 1000 to 3000 seems to be ineffective. By the way, isn't the correct answer rate of 0.45 = 45% low? You may think that, because you are using an actual game record, there are multiple moves to answer a certain board condition, so it seems that the correct answer rate is not so high.
Next, the learning curve when the number of neurons in the hidden layer (h1, h2) is increased from 100 to 200 is shown.
It is the same that it almost converges at about 1000 Epoch. Increasing the number of neurons increases the correct answer rate (main / accuracy) of the training data, but the value of validation / main / accuracy, which is the correct answer rate for unknown input, is almost the same as when the number of neurons is 100. That means ... I don't think I can comply with the rules even if I change from 100 to 200 neurons ... But for the time being, I will check it as retry 5.
I thought that increasing the number of neurons would increase the training effect on the game record and make it possible to comply with the rules. Change the definition of MLP Class in "* build_mlp.py *".
...
class MLP(Chain):
def __init__(self):
super(MLP, self).__init__(
l1=L.Linear(64,200)
l2=L.Linear(200,200)
l3=L.Linear(200,65)
)
...
After making the changes, create a trained model.
$ python build_mlp.py Othelo.01e4.ggf black 100 1000
$ mv reversi_model.npz model_neuron-200_black.npz
$ python build_mlp.py Othelo.01e4.ggf white 100 1000
$ mv reversi_model.npz model_neuron-200_white.npz
Rematch with MLP vs Random using the new model (* model_neuron-200_black.npz, model_neuron-200_white.npz ). (The definition of MLP class in " reversi.py *" also needs to be changed as described above) Illegal move! Occurred 10778 times in 1000 battles. The breakdown is as follows.
Detail | Number of occurrences |
---|---|
Cannot put stone but AI cannot select 'PASS'. | 99 |
Cannot 'PASS' this turn but AI selected it. | 120 |
Cannot put stone at AI selected postion. | 10778 |
total | 10997 |
Since it is 10997/30000, we are doing fraudulent actions at a rate of 36.7%. It's getting worse ... it's probably too fit for your training data.
Connect not only Othello.01e4.ggf but also Othello.02e4.ggf and Othello.03e4.ggf to read the game record, and try to learn more various patterns by increasing the amount by about 3 times.
$ cat Othello.01e4.ggf Othello.02e4.ggf Othello.03e4.ggf > Othello.01-03e4.ggf
$ python build_mlp.py Ohtello.01-03e4.ggf black 100 1000
$ mv reversi_model.npz model_01-03e4_black.npz
$ python build_mlp.py Ohtello.01-03e4.ggf white 100 1000
$ mv reversi_model.npz model_01-03e4_white.npz
The learning curve is as follows. The validation / main / accuracy has improved from 0.3 to 0.35 compared to the case where the game record is only Othello.01e4.ggf. Can you expect a little?
Rematch with MLP vs Random using the new model (* model_01-03e4_black.npz, model_01-03e4_white.npz *). Illegal move! Occurred 5284 times in 1000 matches. The breakdown is as follows.
Detail | Number of occurrences |
---|---|
Cannot put stone but AI cannot select 'PASS'. | 40 |
Cannot 'PASS' this turn but AI selected it. | 228 |
Cannot put stone at AI selected postion. | 5016 |
total | 5284 |
Since it is 5284/30000, it is illegal at a rate of 17.6%. It's not much different from the case where the game record uses only Othello.01e4.ggf (19.1%). It's getting pretty messy ...
This time, I will change my mind and try to modify the MLP input itself. Specifically, in addition to '0': none, '1': black, and '2': white, the board information given as input is added as a place where '3': can be placed. For example. X1 is the state X0 in the case of the black turn, with a place to put it added.
X0 = [[0,0,0,0,0,0,0,0],\
[0,0,0,0,0,0,0,0],\
[0,0,0,0,0,0,0,0],\
[0,0,0,2,1,0,0,0],\
[0,0,1,2,2,2,2,0],\
[0,0,0,1,2,2,0,0],\
[0,0,0,0,1,2,0,0],\
[0,0,0,0,0,0,0,0]]
X1 = [[0,0,0,0,0,0,0,0],\
[0,0,0,0,0,0,0,0],\
[0,0,0,3,3,0,0,0],\
[0,0,3,2,1,3,0,3],\
[0,0,1,2,2,2,2,3],\
[0,0,3,1,2,2,3,0],\
[0,0,0,0,1,2,3,0],\
[0,0,0,0,0,0,0,0]]
By doing this, I thought that AI that complies with the rules could be created. There seems to be a tsukkomi saying "I'm almost telling you the rules!", But the premise that the state of the board is used as input and the next move is obtained as output has not changed. (A pretty painful excuse ...) Changed "* build_mlp.py *" to specify whether to add a place to put it. If True is added at the end of the command, the place where it can be placed will be added to the input board as '3'.
$ pythohn build_mlp.py Othello.01e4.ggf black 100 1000 True
$ mv reversi_model.npz model_black_puttable_mark.npz
$ pythohn build_mlp.py Othello.01e4.ggf white 100 1000 True
$ mv reversi_model.npz model_white_puttable_mark.npz
The learning curve is as follows.
It has the highest validation / main / accuracy so far and converges at around 0.4.
Rematch with MLP vs Random using the new model (* model_black_puttable_mark.npz, model_white_puttable_mark.npz ). Fixed " reversi.py " to support these models. If the specified MLP model name includes " puttable_mark *", it corresponds to the input represented by "3" where it can be placed.
Illegal move! Occurred 207 times in 1000 battles. The breakdown is as follows.
Detail | Number of occurrences |
---|---|
Cannot put stone but AI cannot select 'PASS'. | 16 |
Cannot 'PASS' this turn but AI selected it. | 17 |
Cannot put stone at AI selected postion. | 174 |
total | 207 |
Since it is 207/30000, we are doing something illegal at a rate of 0.7%. What a 1% or less! It is a dramatic improvement.
I will summarize the above.
As an AI that works properly, is it the one with "3" added to the state of the board with "Fail Safe" added? It may be a little sloppy, but I was able to make a decent AI.
At present, the strength of AI depends only on the game record. After creating the basic AI by the above method, it will be strengthened by using reinforcement learning etc. I'm totally ignorant about reinforcement learning, so I'll try after studying.
Thank you for your consideration. It was poor.
Recommended Posts