This time, I will summarize what I learned about cross entropy (cross entropy).
If the sigmoid function is $ \ sigma and $ y = $ \ sigma $ (W ・ x + b), the probability that the neuron will fire (output 1) can be expressed as follows. P (C = 1 | x) = $ \ sigma $ (W · x + b)
On the contrary, the probability of not firing can be expressed as follows. P (C = 0 | x) = 1-$ \ sigma $ (W · x + b)
Expressing these two in one equation, the firing probability of one neuron can be expressed as follows (however, t = 0 or t = 1).
  P(C = t|x) = 
Since the likelihood L of the entire network is the product of the firing probabilities of all neurons,
  
Maximum likelihood can be obtained by maximizing this equation, but it is easier to optimize by minimizing it, so multiply it by minus. Probability multiplication takes a logarithm (log) because the value becomes smaller and smaller and difficult to handle. And if you divide by N so that you can compare even if N changes,
  
This is the formula for cross entropy (cross entropy).
Now, suppose that the correct label $ t_1 $ ~ $ t_3 $ and the prediction probability $ y_1 $ ~ $ y_3 $ are as follows.
 Substituting a value into the cross entropy formula above
Substituting a value into the cross entropy formula above

If you write the code for cross entropy in python,
import numpy as np
def calc_cross_entropy(y_true, y_pred):
    loss = np.mean( -1 * (y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred)), axis=0)
    return loss
y_true =np.array([[1], [0], [0]])
y_pred = np.array([[0.8], [0.1], [0.1]])
answer = calc_cross_entropy(y_true,  y_pred)
print(answer)
#output
# [0.14462153]
Recommended Posts