This article is an easy-to-understand output of ** Deep Learning from scratch Chapter 6 Error back propagation method **. I was able to understand it myself in the humanities, so I hope you can read it comfortably. Also, I would be more than happy if you could refer to it when studying this book.
I would like to implement the layer from the output layer to the loss function, which is the last piece for implementing the back propagation process in the neural network.
This time, I would like to implement layers from the softmax function used for classification to the cross entropy error, but since this part is almost the same implementation even when the sum of squares error used for regression, I think that it can be done by referring to this. I will.
class SoftmaxWithLoss: #Softmax function + cross entropy error layer
def __init__(self):
self.loss = None #Loss function value
self.y = None #Result of softmax function
self.t = None #Teacher data
def forward(self, x, t):
if t.ndim == 1: #Correct answer data is one-If it's not hot, fix it
new_t = []
for i in t:
oh = list(np.zeros(10)) #Number of classification labels
oh[i] = 1
new_t.append(oh)
t = np.array(new_t)
self.t = t
self.y = softmax_function(x)
self.loss = cross_entropy_errors(self.t, self.y)
return self.loss
def backward(self, dout=1):
batch_size = self.t.shape[0]
dx = (self.y - self.t) / batch_size #Divide the error by the number of data to correspond to the badge
return dx
In the forward propagation process, the softmax function can only use the correct answer data of the one-hot method, so if the correct answer data is not one-hot first, it is corrected to one-hot.
After that, call and use the softmax function and cross entropy error method that you have created so far.
The backpropagation process is simple: subtract the correct answer data from the forecast data to get the error, sum it up, and then get the average. To tell the truth, the reason why the combination of softmax function / cross entropy error and identity function / sum of squares error is adjusted so that the back propagation process can be obtained by the above simple formula is adjusted so that the two functions are combined. Because it was made.
Therefore, as I said at the beginning, the back propagation process should be implemented as above even in the case of the identity function and the sum of squares error.
Recommended Posts