The location of legal hand filtering has come before the policy network. My goal is? Reduced memory usage? Note that the meaning of variables may be different from that of the policy network because of that. (Legal_logits and logits, etc.)
features.append(make_input_features_from_board(self.board)) Output of make_input_features_from_board: Where the first piece is, the first piece, the second piece, the second piece [(9x9 matrix), (9x9 matrix), ... (18 + 4 + 4 + 4 + 4 + 2 + 2), (9x9 matrix), (9x9 matrix), ... (18 + 4 + 4 + 4 + 4 + 2 + 2)] This array is added to features as many as legal moves. [[This array], [This array], ..., [This array]]. The number of elements is the number of legal hands because it is carried out after filtering legal hands. For example, if you are a beginner, the number of elements is 30.
y.data An example of y.data
[[-0.04460792]
[ 0.02167853]
[ 0.04610606]
・ ・ ・
[-0.09904062]]
y.data.reshape(-1)
[-0.04460792 0.02167853 0.04610606 -0.10492548 -0.22675163 -0.23193529
-0.06671577 0.02509898 -0.02109829 -0.05519588 -0.05578787 -0.03609923
-0.11021192 -0.10877373 -0.04065045 -0.01540023 -0.0336022 -0.03805592
-0.03325626 -0.02194545 -0.08399387 -0.13204134 -0.2106831 -0.24970257
-0.18735377 -0.08184412 -0.15573277 -0.00548664 -0.0353202 -0.09904062]
The number of elements is the number of legal hands because it is carried out after filtering legal hands. The above is an example of printing at the first stage. The first move is 30 legal moves, so there are 30 elements.
for i, move in enumerate(legal_moves): enumerate returns index and value Policy The network used make_output_label to get the index. The principle meaning is easier to understand. However, there are many descriptions. The value network uses enumerate to get the index. The principle meaning is difficult to understand, but the description is easy. It seems that the meaning of what you are doing is the same.
python-dlshogi\pydlshogi\player\search1_player.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#Environmental setting
#-----------------------------
import socket
host = socket.gethostname()
#Get an IP address
# google colab :random
# iMac : xxxxxxxx
# Lenovo : yyyyyyyy
# env
# 0: google colab
# 1: iMac (no GPU)
# 2: Lenovo (no GPU)
# gpu_en
# 0: disable
# 1: enable
if host == 'xxxxxxxx':
env = 1
gpu_en = 0
elif host == 'yyyyyyyy':
env = 2
gpu_en = 0
else:
env = 0
gpu_en = 1
#strategy
# 'greedy':Greedy Strategy
# 'boltzmann':Softmax strategy
algorithm ='boltzmann'
#-----------------------------
import numpy as np
import chainer
from chainer import serializers
import chainer.functions as F
if gpu_en == 1:
from chainer import cuda, Variable
import shogi
from pydlshogi.common import *
from pydlshogi.features import *
from pydlshogi.network.value import *
from pydlshogi.player.base_player import *
def greedy(logits): #Returns the index of the element with the maximum value among the elements of the list specified in the argument
#In a neural network, logits are the values before passing through the activation function.
return np.argmax(logits)
#Logits in the policy network.index(max(logits))Was Same meaning. Are you simplifying the description little by little?
def boltzmann(logits, temperature):
logits /= temperature # a /=b is a= a /Meaning of b
logits -= logits.max() # a -=b is a= a -The meaning of b. It will be a negative value. The maximum value is 0.
probabilities = np.exp(logits) # x =<0 exp function
probabilities /= probabilities.sum()
return np.random.choice(len(logits), p=probabilities) # choice(i, p=b)Is 0 to i-Randomly returns numbers up to 1 with a probability of b
class Search1Player(BasePlayer):
def __init__(self):
super().__init__()
if env == 0:
self.modelfile = '/content/drive/My Drive/・ ・ ・/python-dlshogi/model/model_value'
elif env == 1:
self.modelfile = r'/Users/・ ・ ・/python-dlshogi/model/model_value' #Value network model created by learning
elif env == 2:
self.modelfile = r"C:\Users\・ ・ ・\python-dlshogi\model\model_value"
self.model = None
def usi(self): #GUI software side: Send USI command after startup. USI side: Returns id (and option) and usiok.
print('id name search1_player')
print('option name modelfile type string default ' + self.modelfile)
print('usiok')
def setoption(self, option):
if option[1] == 'modelfile':
self.modelfile = option[3]
def isready(self): #GUI software side: Send is ready command before the game starts. USI side: Initializes and returns ready ok.
if self.model is None:
self.model = ValueNetwork()
if gpu_en == 1:
self.model.to_gpu()
serializers.load_npz(self.modelfile, self.model)
print('readyok')
def go(self):
if self.board.is_game_over():
print('bestmove resign')
return
#About all legal hands
#The location of legal hand filtering has come before the policy network.
#My goal is? Reduced memory usage?
#Note that the meaning of variables may be different from that of the policy network because of that.(leagal_logits and logits)
legal_moves = []
features = []
for move in self.board.legal_moves:
legal_moves.append(move)
self.board.push(move) #1 hand
features.append(make_input_features_from_board(self.board))
# make_input_features_from_Board output: Where the first piece is, the first piece, the second piece, the second piece
# [(9x9 matrix),
# (9x9 matrix),... is (18 + 4 + 4 + 4 + 4 + 2 + 2),
# (9x9 matrix),
# (9x9 matrix),... is (18 + 4 + 4 + 4 + 4 + 2 + 2)]
#This array is added to features as many as legal moves.[[This array],[This array],・ ・ ・,[This array]]。
#The number of elements is the number of legal hands. For example, if you are a beginner, the number of elements is 30.
self.board.pop() #1 rework
if gpu_en == 1:
x = Variable(cuda.to_gpu(np.array(features, dtype=np.float32)))
elif gpu_en == 0:
x = np.array(features, dtype=np.float32)
#Invert the sign to get the winning percentage on your turn side
with chainer.no_backprop_mode():
y = -self.model(x)
if gpu_en == 1:
logits = cuda.to_cpu(y.data).reshape(-1) # reshape(-1)Make a one-dimensional array with
probabilities = cuda.to_cpu(F.sigmoid(y).data).reshape(-1)
elif gpu_en == 0:
logits = y.data.reshape(-1) #By the way, y.The number of elements of data is the number of legal moves. For example, 30 for the first move.
probabilities = F.sigmoid(y).data.reshape(-1)
# y.An example of data
# [[-0.04460792]
# [ 0.02167853]
# [ 0.04610606]
#・ ・ ・
# [-0.09904062]]
#
# y.data.reshape(-1)
# [-0.04460792 0.02167853 0.04610606 -0.10492548 -0.22675163 -0.23193529
# -0.06671577 0.02509898 -0.02109829 -0.05519588 -0.05578787 -0.03609923
# -0.11021192 -0.10877373 -0.04065045 -0.01540023 -0.0336022 -0.03805592
# -0.03325626 -0.02194545 -0.08399387 -0.13204134 -0.2106831 -0.24970257
# -0.18735377 -0.08184412 -0.15573277 -0.00548664 -0.0353202 -0.09904062]
#The number of elements is the number of legal hands. The above is an example of the first move, and since there are 30 legal moves, there are 30 elements.
for i, move in enumerate(legal_moves):
#enumerate returns index and value
#Make to get index in policy network_output_I was labeling.
#The principle meaning is easier to understand. However, there are many descriptions.
#The value network uses enumerate to get the index.
#The principle meaning is difficult to understand, but the description is easy. It seems that the meaning of what you are doing is the same.
#Show probability
print('info string {:5} : {:.5f}'.format(move.usi(), probabilities[i]))
print(y.data)
print(y.data.reshape(-1))
if algorithm == 'greedy':
#(1) Select the move with the highest probability (greedy strategy) Simply return the element with the highest probability.
selected_index = greedy(logits)
elif algorithm =='boltzmann':
#(2) Choose a hand according to the probability (Softmax strategy) Randomly return elements with a high probability.
selected_index = boltzmann(np.array(logits, dtype=np.float32), 0.5)
bestmove = legal_moves[selected_index]
print('bestmove', bestmove.usi())
AI that employs only the value network and performs only one-hand search. It's too weak.
Game video https://youtu.be/W3ZqlcDg_yE
Final map
Recommended Posts