Introduction

This article is an easy-to-understand output of ** Deep Learning from scratch Chapter 6 Error back propagation method **. I was able to understand it myself in the humanities, so I hope you can read it comfortably. Also, I would be more than happy if you could refer to it when studying this book.

Layer implementation from output layer to loss function

I would like to implement the layer from the output layer to the loss function, which is the last piece for implementing the back propagation process in the neural network.

This time, I would like to implement layers from the softmax function used for classification to the cross entropy error, but since this part is almost the same implementation even when the sum of squares error used for regression, I think that it can be done by referring to this. I will.

class SoftmaxWithLoss: #Softmax function + cross entropy error layer
    def __init__(self):
        self.loss = None #Loss function value
        self.y = None #Result of softmax function
        self.t = None #Teacher data
        
    def forward(self, x, t):
        if t.ndim == 1: #Correct answer data is one-If it's not hot, fix it
            new_t = []
            for i in t:
                oh = list(np.zeros(10)) #Number of classification labels
                oh[i] = 1
                new_t.append(oh)
            t = np.array(new_t)
        self.t = t
        self.y = softmax_function(x)
        self.loss = cross_entropy_errors(self.t, self.y)
        
        return self.loss
    
    def backward(self, dout=1):
        batch_size = self.t.shape[0]
        dx = (self.y - self.t) / batch_size #Divide the error by the number of data to correspond to the badge
        
        return dx

In the forward propagation process, the softmax function can only use the correct answer data of the one-hot method, so if the correct answer data is not one-hot first, it is corrected to one-hot.

After that, call and use the softmax function and cross entropy error method that you have created so far.

The backpropagation process is simple: subtract the correct answer data from the forecast data to get the error, sum it up, and then get the average. To tell the truth, the reason why the combination of softmax function / cross entropy error and identity function / sum of squares error is adjusted so that the back propagation process can be obtained by the above simple formula is adjusted so that the two functions are combined. Because it was made.

Therefore, as I said at the beginning, the back propagation process should be implemented as above even in the case of the identity function and the sum of squares error.

[Deep Learning from scratch] Layer implementation from softmax function to cross entropy error

Introduction

Layer implementation from output layer to loss function