Greedy algorithm. A method of simply selecting the hand with the highest output value of the neural network. logits are the values before passing through the activation function of the neural network output stage.
def greedy(logits): #Returns the index of the element with the maximum value among the elements of the list specified in the argument
#In a neural network, logits are the values before passing through the activation function.
return np.argmax(logits)
It seems that the probability changes depending on the coefficient of temperature.
def boltzmann(logits, temperature):
logits /= temperature # a /=b is a= a /Meaning of b
logits -= logits.max() # a -=b is a= a -The meaning of b. It will be a negative value. The maximum value is 0.
probabilities = np.exp(logits) # x =<0 exp function
probabilities /= probabilities.sum()
return np.random.choice(len(logits), p=probabilities) # choice(i, p=b)Is 0 to i-Randomly returns numbers up to 1 with a probability of b
Flow diagram
As a simple example, the processing up to the exp output when there are five outputs (output 1 is -0.2, output 2 is 0.3, output 3 is 0.5, output 4 is 0, and output 5 is -0.6) is shown. The temperature is 1.
When the temperature is set, the smaller the temperature, the closer the magnitude of each output becomes. In other words, the smaller the temperature, the more even the probability of the move.
In Chapter 8, the process of giving randomness is done at the end. The higher the probability of a hand while having randomness, the easier it is to be selected. This process is not done in Chapter 12. I don't understand it well, but is it suitable for usage?
return np.random.choice(len(logits), p=probabilities) # choice(i, p=b)Is 0 to i-Randomly returns numbers up to 1 with a probability of b
Recommended Posts