Challenge image classification by TensorFlow2 + Keras 7-Understanding layer types and activation functions-

Introduction

This is a study memo (7th) about image classification (Google Colaboratory environment) using TensorFlow2 + Keras. The subject is the classification of handwritten digit images (MNIST), which is a standard item.

--Challenge image classification by TensorFlow2 + Keras series -1. Move for the time being -2. Take a closer look at the input data -3. Visualize MNIST data -4. Let's make a prediction with the trained model -5. Observe images that fail to classify -6. Try preprocessing and classifying images prepared by yourself -7. Understanding layer types and activation functions -8. Select optimization algorithm and loss function -9. Try learning, saving and loading the model

Last time, about the image prepared by himself, "[Introduction to TensorFlow 2.0 for beginners](https://www.tensorflow. Prediction (classification) was performed using the model introduced in "org / tutorials / quickstart / beginner? hl = ja)".

This time, I studied the neural network model featured in the tutorial, the types of layers that make it up (Dense, Dropout, Flatten), and the activation function.

SoftMax.png

How to write a model

The following code is a copy from "Introduction to TensorFlow 2.0 for Beginners".

Construction of NN model (Description 1)


model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

In the above code, the keyword argument ʻactivation` that specifies the ** activation function ** is specified as a character string, but it can also be specified by directly giving the function as follows.

Construction of NN model (Description 2)


model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation=tf.nn.relu),   #Change
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)  #Change
])

Also, here, the layer composition information of the neural network is given as a list type as an argument of Sequential (...), but ** layer is given by using ʻadd (...) `as follows. You can also add ** one by one.

Construction of NN model (Description 3)


model = tf.keras.models.Sequential()                             # (0)
model.add( tf.keras.layers.Flatten(input_shape=(28, 28)) )       # (1)
model.add( tf.keras.layers.Dense(128, activation=tf.nn.relu) )   # (2)
model.add( tf.keras.layers.Dropout(0.2) )                        # (3)
model.add( tf.keras.layers.Dense(10, activation=tf.nn.softmax) ) # (4)

View model overview

You can get an overview of the NN model with the layers set above with summary ().

Model overview confirmation


model.summary()

Execution result


Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 128)               100480    
_________________________________________________________________
dropout (Dropout)            (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                1290      
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________

From top to bottom, the table is ** input layer **, ** intermediate layer (hidden layer) **, ..., ** output layer **.

The leftmost value in the table is "Layer Name". It is automatically assigned if name = is omitted in ʻadd (), and is numbered like flatten_1, flatten_2` every time the model is built.

The value of (), which is the second from the left, is the "layer type". There are three types here: Flatten, Dense, and Dropout. This commentary is in the next section.

The second numerical value of the tuple of the item "Output Shape" is ** the number of neurons in the relevant layer (= the number of outputs from the relevant layer) **. If it is (None, 128), it means that there are 128 neurons (nodes) in that layer.

Next, the item "Param" is the total number of ** parameters ** (** weights ** and ** bias ** related to the input of the layer).

For example, $ 100480 $ of the second layer "dense (Dense)" is a parameter of ** weight ** equal to the number of outputs of the first layer $ 784 $ and the number of nodes of the second layer $ 128 $, the second layer. The total number of parameters including the ** bias ** of $ 128 $ nodes. That is, $ 784 \ times 128 + 128 = 100480 $. Training (training / learning) is the operation for finding the optimum values of these parameters.

At the end of the table is the number of Total params (total parameters), Trainable params (parameters required by training), and Non-trainable params (parameters not required by training).

Roles, actions, and meanings of each layer

Flatten layer

One image of handwritten digit characters is 28 $ \ times $ 28 pixel and is of type (28,28) numpy.ndarray, a two-dimensional array. In the Flatten layer, this is ** flattened ** and fixed in a one-dimensional array. Therefore, the number of outputs is $ 28 \ times 28 = 784 $ as confirmed by model.summary ().

The program adds a Flatten layer to the model as follows:

model.add( tf.keras.layers.Flatten(input_shape=(28, 28)) )   # (1)

In the ʻinput_shape argument, (28, 28) is specified to match x_train [*] .shape. If you want to input an image of 32 $ \ times $ 32 pixel, use ʻinput_shape = (32, 32). The reference is here.

Dense layer

It means a ** fully connected layer ** that is fully bonded (tightly bonded) between the previous layer and the layer concerned. It is the standard layer that makes up a neural network.

The program adds a Dense layer to the model as follows:

Fully connected layer as an intermediate layer


model.add( tf.keras.layers.Dense(128, activation=tf.nn.relu) )   # (2)

Fully connected layer as output layer


model.add( tf.keras.layers.Dense(10, activation=tf.nn.softmax) ) # (4)

The first argument is the number of nodes (number of neurons) that make up the layer. As in (2) above, ** How many nodes should be set in the fully connected layer as the intermediate layer? ** is a factor that affects the performance of the model (it is a hyperparameter set by the user by trial and error). Note that a large number of nodes does not mean that the model will be a high-performance model (at least, as the number of nodes increases, the number of parameters increases, so the amount of calculation increases and learning takes time).

On the other hand, if you are dealing with a multi-class classification problem, ** the number of nodes in the fully connected layer set as the output layer must match the number of classes you want to classify **. In the case of MNIST, it is a classification of numbers from 0 to 9, that is, ** 10 classification problem **, so you need to set 10 here.

Also, give ** activation function ** to the ʻactivation argument. The ** ReLU function ** (tf.nn.relu) and ** SoftMax function ** (tf.nn.softmax) are used here, the details of which are explained in the next section. If ʻactivation = is omitted, the activation function is not applied and the calculated value is output as it is). The reference is here.

Dropout layer

When training the model **, it works to block the output from the previous layer to the next layer according to the probability specified (on a node-by-node basis) (the corresponding node of the previous layer is inactive according to the probability). / It is also expressed as dropping). By providing this layer, it seems that it is difficult to get into the situation of ** overfitting **.

Regarding this, the explanation of "[Neural network] Dropout is summarized" was very easy to understand.

The program adds a Dropout layer to the model as follows:

model.add( tf.keras.layers.Dropout(0.2) )  # (3) 

In the argument, specify the percentage of nodes that you want to deactivate in the range 0.0 to 1.0. Setting this to 0.0 is essentially the same as having no Dropout layer. Also, setting it to 1.0 completely shuts down the network at the Dropout layer, so no learning works (actually, ValueError: rate must be a scalar tensor or a float in the range [0, 1), I get the error got 1).

Note that the nodes to be inactivated will be selected ** randomly ** according to the specified probability. Therefore, if you have this Dropout layer, ** the trained model will change (slightly) with each training **. Therefore, when investigating the influence of other hyperparameters such as the relationship between the number of nodes in the Dense layer and the correct answer rate, give an argument like seed = 1 and fix the random seed (however, training is better). If there is a random element in, even if it is fixed here, the trained model generated will change for each execution).

The reference is here.

Evaluate the effectiveness of the Dropout layer against overfitting

Prepare a model in which the parameter of the Dropout layer (the rate of nodes to be inactivated) is changed from 0.0 to 0.8 in 0.2 increments. Is it effective against overfitting by training and evaluating with Epochs number = 100 and adding a Dropout layer? I observed **.

For each training Epoch, the correct answer rate (accuracy) and loss function value (loss) for the training data x_train, and the correct answer rate (val_accuracy) and loss function value (val_loss) for the test data x_test were acquired and plotted. ..

python


mport numpy as np
import tensorflow as tf

# (1)Download handwritten digit image dataset
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# (2)Data normalization
x_train, x_test = x_train / 255.0, x_test / 255.0

# (3)Build NN model
#■■ Dropout rate to 0.0 to 0.Change up to 8 ■■
epochs = 100
results = list()
for p in np.arange(0.0, 1.0, 0.2) :
  print(f'■ Dropout p={p:.1f}')
  tf.keras.backend.clear_session()  
  model = tf.keras.models.Sequential()
  model.add( tf.keras.layers.Flatten(input_shape=(28, 28)) )
  model.add( tf.keras.layers.Dense(128, activation=tf.nn.relu) )
  model.add( tf.keras.layers.Dropout(p) ) #See the effect of parameter p here
  model.add( tf.keras.layers.Dense(10, activation=tf.nn.softmax) )

  # (4)Compiling the model
  model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])

  # (5)Model training (learning / training)
  r = model.fit(x_train, y_train, validation_data=(x_test,y_test), epochs=epochs)
  print(r.history)
  results.append( dict( rate=p, hist=r.history ) )

#■■ Graph output ■■
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as tk

ylim = dict( )
ylim['accuracy']     = (0.90, 1.00)
ylim['val_accuracy'] = (0.95, 1.00)
ylim['loss']         = (0.00, 0.20)
ylim['val_loss']     = (0.05, 0.30)

xt_style = lambda x, pos=None : f'{x:.0f}'

for v in ['accuracy','loss','val_accuracy','val_loss'] :
  plt.figure(dpi=96)
  for r in results :
    plt.plot( range(1,epochs+1),r['hist'][v],label=f"rate={r['rate']:.1f}")
  plt.xlim(1,epochs)
  plt.ylim( *(ylim[v]) )
  plt.gca().xaxis.set_major_formatter(tk.FuncFormatter(xt_style))
  plt.tick_params(direction='in')
  plt.legend(bbox_to_anchor=(1.02, 1), loc='upper left', borderaxespad=0)
  plt.xlabel('epoch')
  plt.ylabel(v)
  plt.show()

Experimental result

Correct answer rate accuracy for training data

g1.png

In either model, the more you learn, the better the value will be. In particular, at rate = 0.0, which is virtually equivalent to the Dropout layer pear, the highest score of 1.0 is reached. Basically, the smaller the rate of inactivation, the faster the learning and the better the final correct answer rate.

Loss function value for training data loss

g2.png

Basically, the tendency is similar to the accuracy rate.

Correct answer rate for test data val_accuracy

g3.png

From here, it becomes a true evaluation index including generalization performance.

rates = 0.2 and 0.4 converge quickly and the final values are good. On the other hand, 0.6 is clearly inferior to 0.2 and 0.4. 0.0 is a little less stable than the others.

As far as the correct answer rate is concerned, we cannot read the tendency of overfitting.

Loss function value for test data val_loss

g4.png

At rate = 0.0 (equivalent to no Dropout layer), the value gradually deteriorates after Epoch = 8. You can clearly see the tendency of overfitting.

By observing the slope of each data after Epoch = 20, it can be seen that the larger the inactivity rate (rate), the more difficult it is to overfit. We were able to confirm the effectiveness of the Dropout layer.

Comprehensive evaluation confirmed that the values of rate = 0.2 and Epochs = 5 set in the tutorial were good parameters that were well tuned.

Activation function

If the activation function is ** not applied **, the output $ y_1 $ of the first node in the second layer is calculated as follows ($ x_i $ is the output of the $ i $ th node in the previous layer, $ w_ {i1} $ is the weight, $ b_ {1} $ is the bias).

y_1 = b_1 + \sum_{i=1}^{784} x_{i}w_{i1}

On the other hand, when the activation function $ f $ is applied, its $ y_1 $ becomes:

y_1 = f ( b_1 + \sum_{i=1}^{784} x_{i}w_{i1} )

There are various activation functions used in neural networks, but they are roughly divided into ** commonly used in the middle layer ** and ** used in the output layer ** depending on the problem type. I will.

In the middle layer, ** ReLU function ** and ** sigmoid function ** are used. Also, due to its nature, the output layer of multiclass classification problems uses the ** SoftnMax function **, and the two-class classification problem uses the ** sigmoid function **.

ReLU function

** It seems to be the most commonly used activation function in the middle layer **. If the input is less than 0, the output is 0, and if the input is 0 or more, the input is output as it is. tf.nn.relu (). The reference is here.

ReLU.png

ReLU function


import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

xmin, xmax = -10,10
x = np.linspace(xmin, xmax,1000)
y = tf.nn.relu(x)  #ReLU function

#Check the shape on the graph
plt.figure(dpi=96)
plt.plot(x,y,lw=3)
plt.xlim(xmin, xmax)
plt.ylim(-1, 12)
plt.hlines([0],*(plt.xlim()),ls='--',lw=0.5)
plt.vlines([0],*(plt.ylim()),ls='--',lw=0.5)

It seems that there are "Leaky Relu function" and "Parametric ReLU function" as variants of the ReLU function.

--Reference: About the activation function ReLU and the ReLU clan

Sigmoid function

One of the activation functions often used in the intermediate layer. However, if the sigmoid function is used as the activation function in the ** NN model with a large number of layers **, the ReLU function seems to be deprived of its popularity due to the vanishing gradient problem. tf.math.sigmoid (). The reference is here.

sig.png

python


import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

xmin, xmax = -10,10
x = np.linspace(xmin, xmax,1000)
y = tf.math.sigmoid(x)  #Sigmoid function

#Check the shape on the graph
plt.figure(dpi=96)
plt.plot(x,y,lw=3)
plt.xlim(xmin, xmax)
plt.ylim(-0.2, 1.2)
plt.hlines([0,0.5,1],*(plt.xlim()),ls='--',lw=0.5)
plt.vlines([0],*(plt.ylim()),ls='--',lw=0.5)

SoftMax function

Commonly used in the ** output layer ** of multiclass classification problems. Regardless of the input, the output ranges from 0.0 to 1.0, and the total output is 1.0. tf.nn.softmax (). The reference is here.

For example, if you apply the SoftMax function to the input [2, -1, 1, 1] as follows, you will get an output such as [0.56, 0.03, 0.21, 0.21] (sum of elements is 1.0). You can get it.

SoftMax.png

ReLU function


import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patheffects as pe
import tensorflow as tf

x = np.array( [2, -1, 1, 1], dtype=np.float64 )
fx = tf.nn.softmax(x)
fx = fx.numpy() # 'numpy.ndarray'Get content with

np.set_printoptions(precision=2)
print(f'fx = {fx}')
print(f'fx.sum() = {fx.sum():.2f}')

fig, ax = plt.subplots(nrows=2, ncols=1, dpi=96)
plt.subplots_adjust(hspace=0.12)
ep = (pe.Stroke(linewidth=3, foreground='white'),pe.Normal())
tp = dict(horizontalalignment='center',verticalalignment='center')

ax[0].bar( np.arange(0,len(x)), x, fc='tab:red' )
ax[1].bar( np.arange(0,len(fx)), fx )
ax[1].set_ylim(0,1)

for i, p in enumerate([x,fx]) :
  ax[i].tick_params(axis='x', which='both', bottom=False, labelbottom=False)
  ax[i].set_xlim(-0.5,len(p)-0.5)
  for j, v in enumerate(p):
    t = ax[i].text(j, v/2, f'{v:.2f}',**tp)
    t.set_path_effects(ep) 

ax[0].hlines([0],*(plt.xlim()),lw=1)

ax[0].set_ylabel('x')
ax[1].set_ylabel('SoftMax(x)')

next time

Undecided

Recommended Posts

Challenge image classification by TensorFlow2 + Keras 7-Understanding layer types and activation functions-
Challenge image classification with TensorFlow2 + Keras 6-Try preprocessing and classifying images prepared by yourself-
Challenge image classification by TensorFlow2 + Keras 1-Move for the time being-
Challenge image classification with TensorFlow2 + Keras 9-Learning, saving and loading models-
Challenge image classification by TensorFlow2 + Keras 5 ~ Observe images that fail to classify ~
Challenge image classification with TensorFlow2 + Keras 3 ~ Visualize MNIST data ~
Image classification with self-made neural network by Keras and PyTorch
Challenge image classification by TensorFlow2 + Keras 2 ~ Let's take a closer look at the input data ~
Challenge image classification with TensorFlow2 + Keras CNN 1 ~ Move for the time being ~
Judge Yosakoi Naruko by image classification of Tensorflow.
I tried to make an image classification BOT by combining TensorFlow Lite and LINE Messaging API
Visualize activation functions side by side
Easy image classification with TensorFlow
I touched Tensorflow and keras