What the go () function in mcts_player.py is doing
mcts_player.py Make the following modifications before actually running the program.
Fixed to PolicyValueResnet (blocks = 5) because an error will occur if the argument blocks are not entered in PolicyValueResnet () that loads the model. 5 is the same number of blocks as when training this model.
pydlshogi/player/mcts_player.py
def isready(self):
#Load model
if self.model is None:
self.model = PolicyValueResnet(blocks=5)
Let the IP address determine which PC is running. Switch whether to use GPU with flag gpu_en. Also switch the model file path with the flag env.
pydlshogi/player/mcts_player.py
#Environmental setting
#-----------------------------
import socket
host = socket.gethostname()
#Get an IP address
# google colab :random
# iMac : xxxxxxxx
# Lenovo : yyyyyyyy
# env
# 0: google colab
# 1: iMac (no GPU)
# 2: Lenovo (no GPU)
# gpu_en
# 0: disable
# 1: enable
if host == 'xxxxxxxx':
env = 1
gpu_en = 0
elif host == 'yyyyyyyy':
env = 2
gpu_en = 0
else:
env = 0
gpu_en = 1
#-----------------------------
At import
pydlshogi/player/mcts_player.py
if gpu_en == 1:
from chainer import cuda, Variable
def init(self):
pydlshogi/player/mcts_player.py
#Model file path
if env == 0:
self.modelfile = '/content/drive/My Drive/・ ・ ・/python-dlshogi/model/model_policy_value_resnet'
elif env == 1:
self.modelfile = r'/Users/・ ・ ・/python-dlshogi/model/model_policy_value_resnet'
elif env == 2:
self.modelfile = r"C:\Users\・ ・ ・\python-dlshogi\model\model_policy_value_resnet"
self.model = None #model
def eval_node()
pydlshogi/player/mcts_player.py
if gpu_en == 1:
x = Variable(cuda.to_gpu(np.array(eval_features, dtype=np.float32)))
elif gpu_en == 0:
x = np.array(eval_features, dtype=np.float32)
with chainer.no_backprop_mode():
y1, y2 = self.model(x)
if gpu_en == 1:
logits = cuda.to_cpu(y1.data)[0]
value = cuda.to_cpu(F.sigmoid(y2).data)[0]
elif gpu_en == 0:
logits = y1.data[0]
value = F.sigmoid(y2).data[0]
def isready()
pydlshogi/player/mcts_player.py
#Load model
if self.model is None:
self.model = PolicyValueResnet(blocks=5)
if gpu_en == 1:
self.model.to_gpu()
I ran the program at the start. As a result, 7 6 steps visited 143 times and 2 6 steps visited 77 times. It looks good for the time being. I could do it on Google Colab.
move_count: Number of visits to that move node nnrate: Predicted probability of policy network win_rate: Average win rate of move (= total win rate / number of visits) nps : node/time time: Time required for one go () nodes: Number of visits to the current node (current_node.move_count) hashfull: Occupancy of node_hash and uct_node arrays. Indicates how many of the 4096 elements were used. A numerical value expressed as 1000 when it occupies 100%. score cp: evaluation value pv: Move coordinates
From the above results, the time required for each execution was 3524 ms for the CPU and 2020 ms for the GPU. Therefore, the GPU / CPU speed ratio is 3524/2020 = 1.7. It's about twice as fast.
The number of visits (nodes) of the current node is the same as 235 for both GPU and CPU. You can simply see that the speed per visit is faster on the GPU.
For each search, a function called interrupt_check () is used to determine whether to terminate the search in the middle. The number of searches is set to 300, but if the best move cannot be exceeded even if all the remaining searches are spent on the next best move, the search is terminated even if the number of searches has not reached 300. Since the above result is nodes 235, it can be seen that it was censored after 235 times.
For reference, I executed go () continuously to plot the number of visits to the current node. It can be seen that the number of visits to the current node does not increase as the number of times go () is executed increases. As the number of times go () is executed increases, the number of visits to the child node also increases, so the search may be terminated sooner.
I tried to understand how the number of visits to the current node changes depending on the number of searches. If the number of visits is set to 6000 or more, the result will not increase from 4057. This is just the result at the beginning. Results will vary depending on the situation.
Game video https://youtu.be/H0jD76R2PAM Self (Amateur 1st Stage) vs. Chapter 12-5 AI Policy Network & Value Network & Monte Carlo Tree Search. There is no parallelization. It was weaker than I had imagined because I often made strange moves.
Final map
Recommended Posts