Overview of Monte Carlo Tree Search

What the go () function in mcts_player.py is doing

mcts_player.py Make the following modifications before actually running the program.

Added model load arguments

Fixed to PolicyValueResnet (blocks = 5) because an error will occur if the argument blocks are not entered in PolicyValueResnet () that loads the model. 5 is the same number of blocks as when training this model.

`pydlshogi/player/mcts_player.py`


    def isready(self):
        #Load model
        if self.model is None:
            self.model = PolicyValueResnet(blocks=5)

CPU / GPU automatic switching

Let the IP address determine which PC is running. Switch whether to use GPU with flag gpu_en. Also switch the model file path with the flag env.

`pydlshogi/player/mcts_player.py`


#Environmental setting
#-----------------------------
import socket
host = socket.gethostname()
#Get an IP address
# google colab  :random
# iMac          : xxxxxxxx
# Lenovo        : yyyyyyyy

# env
# 0: google colab
# 1: iMac (no GPU)
# 2: Lenovo (no GPU)

# gpu_en
# 0: disable
# 1: enable

if host == 'xxxxxxxx':
    env = 1
    gpu_en = 0
elif host == 'yyyyyyyy':
    env = 2
    gpu_en = 0
else:
    env = 0
    gpu_en = 1

#-----------------------------

At import

`pydlshogi/player/mcts_player.py`


if gpu_en == 1:
    from chainer import cuda, Variable

def init(self):

`pydlshogi/player/mcts_player.py`


        #Model file path
        if env == 0:
            self.modelfile = '/content/drive/My Drive/・ ・ ・/python-dlshogi/model/model_policy_value_resnet'
        elif env == 1:
            self.modelfile = r'/Users/・ ・ ・/python-dlshogi/model/model_policy_value_resnet'
        elif env == 2:
            self.modelfile = r"C:\Users\・ ・ ・\python-dlshogi\model\model_policy_value_resnet"
        self.model = None #model

def eval_node()

`pydlshogi/player/mcts_player.py`


        if gpu_en == 1:
            x = Variable(cuda.to_gpu(np.array(eval_features, dtype=np.float32)))
        elif gpu_en == 0:
            x = np.array(eval_features, dtype=np.float32)

        with chainer.no_backprop_mode():
            y1, y2 = self.model(x)

            if gpu_en == 1:
                logits = cuda.to_cpu(y1.data)[0]
                value = cuda.to_cpu(F.sigmoid(y2).data)[0]
            elif gpu_en == 0:
                logits = y1.data[0]
                value = F.sigmoid(y2).data[0]

def isready()

`pydlshogi/player/mcts_player.py`


        #Load model
        if self.model is None:
            self.model = PolicyValueResnet(blocks=5)
            if gpu_en == 1:
                self.model.to_gpu()

Test from command line

I ran the program at the start. As a result, 7 6 steps visited 143 times and 2 6 steps visited 77 times. It looks good for the time being. I could do it on Google Colab.

move_count: Number of visits to that move node nnrate: Predicted probability of policy network win_rate: Average win rate of move (= total win rate / number of visits) nps : node/time time: Time required for one go () nodes: Number of visits to the current node (current_node.move_count) hashfull: Occupancy of node_hash and uct_node arrays. Indicates how many of the 4096 elements were used. A numerical value expressed as 1000 when it occupies 100%. score cp: evaluation value pv: Move coordinates

GPU is twice as fast as CPU

From the above results, the time required for each execution was 3524 ms for the CPU and 2020 ms for the GPU. Therefore, the GPU / CPU speed ratio is 3524/2020 = 1.7. It's about twice as fast.

The number of visits (nodes) of the current node is the same as 235 for both GPU and CPU. You can simply see that the speed per visit is faster on the GPU.

Effect of search termination

For each search, a function called interrupt_check () is used to determine whether to terminate the search in the middle. The number of searches is set to 300, but if the best move cannot be exceeded even if all the remaining searches are spent on the next best move, the search is terminated even if the number of searches has not reached 300. Since the above result is nodes 235, it can be seen that it was censored after 235 times.

For reference, I executed go () continuously to plot the number of visits to the current node. It can be seen that the number of visits to the current node does not increase as the number of times go () is executed increases. As the number of times go () is executed increases, the number of visits to the child node also increases, so the search may be terminated sooner.

Number of searches

I tried to understand how the number of visits to the current node changes depending on the number of searches. If the number of visits is set to 6000 or more, the result will not increase from 4057. This is just the result at the beginning. Results will vary depending on the situation.

Game

Game video https://youtu.be/H0jD76R2PAM Self (Amateur 1st Stage) vs. Chapter 12-5 AI Policy Network & Value Network & Monte Carlo Tree Search. There is no parallelization. It was weaker than I had imagined because I often made strange moves.

Final map

Deep Learning with Shogi AI on Mac and Google Colab Chapter 12 3 ~ 5

Overview of Monte Carlo Tree Search

Added model load arguments

pydlshogi/player/mcts_player.py

CPU / GPU automatic switching

pydlshogi/player/mcts_player.py

pydlshogi/player/mcts_player.py

pydlshogi/player/mcts_player.py

pydlshogi/player/mcts_player.py

pydlshogi/player/mcts_player.py

Test from command line

GPU is twice as fast as CPU

Effect of search termination

Number of searches

Game

`pydlshogi/player/mcts_player.py`

`pydlshogi/player/mcts_player.py`

`pydlshogi/player/mcts_player.py`

`pydlshogi/player/mcts_player.py`

`pydlshogi/player/mcts_player.py`

`pydlshogi/player/mcts_player.py`