Introduction

"Deep Learning from scratch-The theory and implementation of deep learning learned from Python" has been read, so the sequel "Deep from scratch" Learning ❷ ーー Natural Language Processing ”, but the pace of life changed due to the influence of the new Corona, so I was able to study well. Can no longer be made. Until now, it was a daily routine to stop by a coffee shop on the way home from work to study, but when it came to teleworking, I ended up with a beer life in 5 minutes after work: sweat_smile:

This is not good, so I will whip on Arafif's body and resume studying. Like Posted in the previous volume, I'm going to make a note of what I stumbled upon in this book, so I made a mistake. I would be grateful if you could point out any such issues.

My environment

As before, we will proceed with the virtual machine on the Mac. The OS of the virtual machine has been upgraded from Mojave to Catalina.

host	environment
hard	Mac mini(Late 2012)
OS	macOS Mojave version 10.14.6
Virtualization infrastructure	Parallels Desktop 15 for Mac Pro Edition Version 15.1.4 (47270)

The guests	environment
OS	macOS Catalina version 10.15.4
Development environment	Anaconda 2019.10 (Python 3.7)
editor	Visual Studio Code Version 1.44.2

It may not be possible to run a virtual machine on a machine eight years ago and deep learning, but the first volume has managed to do so, so I will continue. For more information on my environment, please refer to the previous Note that an amateur stumbled in deep learning made from scratch: Chapter 1.

Chapter 1 Review of Neural Networks

In the "For whom" part of the "Preface" of this book, "Those who have knowledge of neural networks and Python are designed so that they can read this book without knowledge of the previous work. However, I think it is a difficult place to know how much prerequisite knowledge should be required.

This chapter 1 is packed with more than half of the contents of the previous volume, so if you are about to be frustrated by chapter 1, first [first volume](https://www.oreilly.co. We recommend that you read jp / books / 9784873117584 /).

Below, I will list the points that I stumbled upon, including the points that I noticed due to the differences from the previous volume.

1.1 Review of math and Python

--Broadcasting in reference [1] introduced in "1.1.3 Broadcast" is in English. However, in Japanese, nkmk's NumPy broadcast (automatic shape conversion) is easy to understand.

--I recently learned that the np.dot (x, y) that appears in" 1.1.4 Vector inner product and matrix product "can also be written as x @ y using the @ operator. I think @ is better, but since it is said that this @ can be used after Python 3.5, it may be that the minority still uses @. I don't know.

1.2 Neural network inference

--In the implementation of "1.2.2 Implementation of classification and forward propagation as a layer", all weights are now listed. In the previous volume, the dictionary was stored with keys for each layer and bias (such as " W1 " and " b1 "), so the implementation policy has been changed.

1.3 Neural network learning

――In the explanation of "1.3.4 Calculation Graph", the types of basic nodes explained at the beginning have increased from the previous volume (I rushed into telework with the previous volume left in the company and I am reliant on memory, but Repeat Did the explanation of the node, Sum node, and MatMul node appear as a basic node?). This makes the explanation of "1.3.5.2 Affine layer" easier to understand. --In "1.3.4.5 MatMul node", there is an explanation that using a 3-point reader (...) for assignment to a NumPy array will result in a deep copy. It seems that the policy is to emphasize memory utilization efficiency and speed rather than the previous volume. However, the explanation that using a 3-point reader as the substitution destination makes a deep copy did not come to my mind, so I investigated a little. Apparently, using slices in ndarray creates a view object, assigning it to it overwrites the original data, and there is a 3-point reader as a convenient abbreviation for slices. is. For view objects, DeepAge's Explanation of NumPy copy and view in an easy-to-understand manner, and for the 3-point reader, nkmk Specify the dimension of NumPy array ndarray by omitting it with Ellipsis (...) was easy to understand. --The gradient implementation in "1.3.5 Gradient Derivation and Backpropagation Implementation" is different from the previous volume, and the memory of grads allocated first is reused.

1.4 Solving problems with neural networks

--The implementation that randomly selects data in the mini-batch of "1.4.3 Source code for learning" is numpy.random.permutation () in epoch units. /random/generated/numpy.random.permutation.html#numpy-random-permutation) has been changed. In the first volume, for each batch numpy.random.choice (), So there was a possibility that the data used between batches would be duplicated, but this implementation eliminates the duplication.

--Similarly, in "1.4.3 Source code for learning", when you execute the source code of the book, the decision boundary is visualized, but it is clear that the boundary is detected by some special method and the range is filled. I wondered if I was doing it Source code I just inferred the coordinates round and plotted the results. Speaking of course, it's natural, but it's kind of muddy and interesting. Also, useful functions such as numpy.meshgrid () are provided. The great thing about Python (NumPy? Matplotlib?) Is that you can make such visualizations in just a few lines.

――I think that the explanation of "1.4.4 Trainer class" was not in the first volume (the first volume is still in the teleworking as a company), but source code of the first volume -japan / deep-learning-from-scratch / blob / master / common / trainer.py) has already been adopted.

1.5 Acceleration of calculation

―― “1.5.1 Bit Precision” was briefly explained in the last chapter of the previous volume, but the use of 16-bit floating point numbers has begun. However, since it is a transitional period, it is a policy to apply it only when saving weights to reduce capacity. ――The same applies to "1.5.2 GPU (CuPy)", and although it was only a light explanation in the previous volume, the use of CuPy has started. However, since it is basically implemented by the CPU, it is okay without a GPU.

1.6 Summary

Since the focus was on the review of the previous volume, there was no major stumbling block. It's good that the source code has been improved in various ways.

That's all for this chapter. If you have any mistakes, I would be grateful if you could point them out.

Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 1