Previously, I wrote an article Let's make Othello with wxPython. At that time, I didn't implement a decent computer AI, so I'll try to make it a little more decent. This time, we will introduce a game strategy called Min-Max strategy that we have just studied. I would like to implement it and play against each other.
Modify the wxPython Othello created in Previous Article.
All implemented source code can be found on the GitHub "alpha-beta" branch, so please get the latest version.
I prepared a switch called comp_ai and toggled it on the computer turn to switch the AI. comp_ai tries to randomly select [1, -1] as the initial value. In the case of vs Man, it is fixed at comp_ai = 0.
def doComputer(self, my_color):
self.comp_ai *= -1
...
def decideComputerNext(self, pos_list, gain_list):
...
# Insert a computer's AI here
if self.comp_ai >= 0: # comp_ai == 0 => vs Man mode
# Computer AI - A
else:
# Computer AI - B
return next_pos
Check the GUI for this. It is added at the bottom right. Select Computer vs Computer, enter the number of loops, and press "Comp vs Comp Loop" to perform Comp vs Comp a specified number of times and display the number of wins for each AI. The numbers at the bottom are the number of AI-A wins, the number of AI-B wins, and the number of Draws.
As an aside, [GUI collapses on Windows 10](http://qiita.com/kanlkan/items/5e6f2e63de406f46b3b1#%E5%8B%95%E4%BD%9C%E7%A2%BA%E8%AA%8D % E7% 92% B0% E5% A2% 83) As a countermeasure, I recreated the GUI with 1 panel and 1 component without using Sizer. Layout is difficult without Sizer. .. .. Why doesn't Sizer work properly on Windows 10? Please let me know if you know.
No look-ahead, that is, randomly decide where to put the stone from all the places.
def computerAi_Random(self, pos_list, gain_list):
index = random.randint(0, len(pos_list)-1)
return pos_list[index]
1 Look ahead, that is, place the stone in the place where the number of stones you can get is the largest. (In Othello's joseki, it seems better to put it in the position where the stones that can be taken are the least in the early stage ... Well, I don't care.)
def computerAi_1stGainMax(self, pos_list, gain_list):
index_list = []
max_gain = max(gain_list)
for i, val in enumerate(gain_list):
if max_gain == val:
index_list.append(i)
tgt = random.randint(0, len(index_list)-1)
return pos_list[index_list[tgt]]
The Min-Max strategy is a method of examining all the branches up to a few moves, assuming that the state transition of the game tree is advanced by taking the best possible action with each other. As an example, let's try to find the best move with black as yourself. First move = black turn, think up to 3 moves. In this case, the place where you can put it in the black turn is the light blue circle, and the number of stones that can be turned over is the number in the circle. Here, if Black chooses a move of σ00, the place to put it in the next white turn and the number of stones that can be taken are as follows. Here, if White chooses a move with σ10, the place to put it in the next black turn and the number of stones that can be taken are as follows. If you write a game tree up to 2 minions in this way, it will be as follows. The evaluation value of the black turn of 2 minions is the maximum number of stones that can be taken (maximum value of σ20 to σ24). The lower number in the circle in the "Black 2" column is the evaluation value. Next, the evaluation value is propagated to the column of "white 1", which is the white turn of one move. White also chooses his best move, so he should choose the move that minimizes the rating in the "Black 2" column. In other words, the minimum evaluation value of "Black 2" is propagated to the "White 1" column. Next, the evaluation value is propagated from "white 1" to "black 1". Contrary to the previous time, Black should choose his hand so that his evaluation value is maximized, so he propagates the maximum evaluation value of "White 1". Thinking in this way, you can decide the first move of black. It is called the Min-Max strategy because the hand is determined by repeating the minimum and maximum evaluation values. In this example, the evaluation value does not change with any hand from σ00 to σ03, so it can be placed in any way. This seems to be intuitively correct because the place where you can put it with your first hand is point-symmetrical. It was an example of making it a little more interesting. Considering the latter half of the game, the size of the tree becomes very large, so I took the first move as an example.
I will implement it. The reason why the max_depth of minMax is 2 even though it is 3 pre-reading is that the final evaluation value is propagated in computeAi_MinMax_3, so enter the number minus 1 in the argument.
def computerAi_MinMax_3(self, pos_list, gain_list):
value = []
update_pos_list = []
self.log_on = False
value = self.minMax(2, 2, pos_list, gain_list)
for i, pos in enumerate(pos_list):
if max(value) == value[i]:
update_pos_list.append(pos)
self.log_on = True
tgt = random.randint(0, len(update_pos_list)-1)
return update_pos_list[tgt]
def minMax(self, depth, max_depth, pos_list, gain_list): # depth > 0
value = []
next_value = []
next_pos_list = []
next_gain_list = []
self.backUpAllState(self.state_storage_list[depth])
for pos in pos_list:
ret = self.putComputerStone(pos, False)
next_pos_list, next_gain_list = self.scanPuttableCell()
if (depth > 1):
next_value = self.minMax(depth-1, max_depth, next_pos_list, next_gain_list)
if len(next_value) == 0:
value.append(0)
elif (max_depth - depth) % 2 == 0:
value.append(min(next_value))
else:
value.append(max(next_value))
else:
if len(next_gain_list) == 0:
value.append(0)
elif (max_depth - depth) % 2 == 0:
value.append(min(next_gain_list))
else:
value.append(max(next_gain_list))
self.restoreAllState(self.state_storage_list[depth])
return value
There was a bit of redundancy left, but I was able to implement it neatly. However, it took a long time to get here. .. ..
Let's play 100 battles between AIs. The first attack and the second attack are set randomly.
The result is as follows.
Ah, the Min-Max strategy isn't very strong. I think the implementation is correct. Well, is it salvation to win without look-ahead? By the way, this Min-Max strategy is very time consuming. Please note that if you play Computer vs Computer with your own hands, it will take about 10 minutes to play 100 battles, depending on the performance of your PC. In the future, I will try to implement the αβ strategy to speed up by narrowing the search range of the Min-Max strategy.
One possible reason why the Min-Max strategy didn't work was that every stone was valued as +1. Therefore, weight each place. I understand that the evaluation value seems to be high somehow, but it was a bit of a stumbling block, so I referred to this site. http://uguisu.skr.jp/othello/5-1.html
So, weight it as follows and modify it so that you can get gain.
gGain = [( 30, -12, 0, -1, -1, 0, -12, 30), \
(-12, -15, -3, -3, -3, -3, -15, -12), \
( 0, -3, 0, -1, -1, 0, -3, 0), \
( -1, -3, -1, -1, -1, -1, -3, -1), \
( -1, -3, -1, -1, -1, -1, -3, -1), \
( 0, -3, 0, -1, -1, 0, -3, 0), \
(-12, -15, -3, -3, -3, -3, -15, -12), \
( 30, -12, 0, -1, -1, 0, -12, 30)]
...
def vecScan(self, pos, reverse_on):
rev_list = []
temp_list = []
gain = 0
is_hit = 0
if reverse_on == 0 and self.getCellState(pos,(0,0)) != "green":
return 0, gain
if self.now_color == "black":
rev_color = "white"
else:
rev_color = "black"
for v in gVec:
temp_list = []
for i in range(1, 8):
if self.getCellState(pos,(v[0]*i,v[1]*i)) == rev_color:
temp_list.append(self.movePos(pos,(v[0]*i, v[1]*i)))
if self.getCellState(pos,(v[0]*i+v[0], v[1]*i+v[1])) == self.now_color:
is_hit = 1
for j in temp_list:
rev_list.append(j)
break
else:
break
if reverse_on == True:
if self.log_on == True:
self.log_textctrl.AppendText("put:" + str(pos) + ", " + str(rev_list) + " to " + str(self.now_color) + "\n")
for rev_pos in rev_list:
self.setCellState(rev_pos, (0,0), self.now_color)
if self.now_color == "black":
self.player_score[0] += 1
self.player_score[1] -= 1
else:
self.player_score[1] += 1
self.player_score[0] -= 1
self.updateScoreLabel()
gain = self.calcGain(pos, rev_list)
return is_hit, gain
def calcGain(self, pos, rev_list):
ret_gain = 0
ret_gain += gGain[pos[0]][pos[1]]
for rev_pos in rev_list:
ret_gain += gGain[rev_pos[0]][rev_pos[1]]
return ret_gain
Let's play again.
The result is as follows.
Uh ... Min-Max's winning percentage has dropped. 1 Look-ahead is quite strong. There is room for reconsideration. It was poor.
Recommended Posts