Running the TensorFlow Expert Tutorial Deep MNIST for Experts, the "pooling" process was as confusing as the "convolution". Like convolution, it's not a general term for liberal arts graduates, and I stumbled a little, but it's easy to understand. It's easier than convolution because it doesn't use a filter like convolution. "[Explanation for beginners] TensorFlow tutorial Deep MNIST" In the article, about the processing of convolutional neural networks "[Explanation for beginners] Introduction to convolutional processing" (Explanation in TensorFlow) " explains the specialization of convolution processing, so please refer to it. * Posted with reference to the image output by TensorBoard (2017/7/27) </ sup>
As explained in the article "[Explanation for beginners] TensorFlow tutorial Deep MNIST", the "pooling" process roughly organizes the features of the image. .. Not only images but also sounds and data are fine, but images are more visual and easier to understand, so I will explain using images.
The convolution process filters for features. Using an image as input and using a filter, the number of images for the number of filters is output. When using MNIST data, it is as follows.
It was like this when I arranged the actual pooling processing input / output images that I put out on TensorBoard. Unlike the convolution process, it is easy to understand because it is only roughened.
The following part of the TensorFlow expert tutorial Deep MNIST for Experts is the pooling process. The same is true for the first and second layers. It can be implemented at this level by using the TensorFlow API.
# Max Pooling
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
If you use a diagonal bar image of 4 vertical x 4 horizontal x 1 color (black only) as an input image example and perform convolution processing before pooling, the result will be as follows. In order to make the result easy to understand, it is expressed by two, "1" and "-1".
Max Pooling Then, it is the pooling of the main subject. The TensorFlow Expert Tutorial Deep MNIST for Experts uses Max Pooling as the pooling type. Max Pooling simply selects the maximum value for each range and compresses it. As shown in the figure, the characteristics of the diagonal bar could be compressed in half from 4 (vertical) x 4 (horizontal) to 2 (vertical) x 2 (horizontal). This is what was described as "roughly organizing the features". By the way, there is also average pooling, in which case it is as shown in the figure below.
Recommended Posts