uct_search() Roughly speaking, a method that selects nodes with a large UCB value and adds the total winning percentage. It was quite difficult to understand, so I made an example to understand it. For example, suppose that the search is performed from the start to the third search as follows.
Each time a search is performed, the winning percentage from the node viewpoint is added to the variables of each node. Variable ① Total win rate (variable name win): Total of ② for all nodes. (In this example, ① has the same value as ②, but since each node has only searched for one, it is just the same value, and ① and ② are different variables.) Variable ② Total win rate when node ** is selected (variable name child_win) There are other variables, but these two are important: ① and ②. Note that the predicted win rate (variable name value_win) of the value network and ① are different variables.
The function uct_search () also records the number of visits to each node. After exiting this function, the move with the highest number of visits will be selected as the final move in the go () function. In addition, the value obtained by dividing ② by the number of visits, that is, the average winning percentage, is treated as the winning percentage of the move. Keeping that in mind makes it easier to understand.
In this example, I tried to follow how the contents of the variables change in chronological order.
I finally understood it so far. It simply adds up your winning percentage and your opponent's negative percentage (= your winning percentage).
Recommended Posts