This is a study memo (7th) about image classification (Google Colaboratory environment) using TensorFlow2 + Keras. The subject is the classification of handwritten digit images (MNIST), which is a standard item.
--Challenge image classification by TensorFlow2 + Keras series -1. Move for the time being -2. Take a closer look at the input data -3. Visualize MNIST data -4. Let's make a prediction with the trained model -5. Observe images that fail to classify -6. Try preprocessing and classifying images prepared by yourself -7. Understanding layer types and activation functions -8. Select optimization algorithm and loss function -9. Try learning, saving and loading the model
Last time, about the image prepared by himself, "[Introduction to TensorFlow 2.0 for beginners](https://www.tensorflow. Prediction (classification) was performed using the model introduced in "org / tutorials / quickstart / beginner? hl = ja)".
This time, I studied the neural network model featured in the tutorial, the types of layers that make it up (Dense
, Dropout
, Flatten
), and the activation function.
The following code is a copy from "Introduction to TensorFlow 2.0 for Beginners".
Construction of NN model (Description 1)
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
In the above code, the keyword argument ʻactivation` that specifies the ** activation function ** is specified as a character string, but it can also be specified by directly giving the function as follows.
Construction of NN model (Description 2)
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation=tf.nn.relu), #Change
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.softmax) #Change
])
Also, here, the layer composition information of the neural network is given as a list type as an argument of Sequential (...)
, but ** layer is given by using ʻadd (...) `as follows. You can also add ** one by one.
Construction of NN model (Description 3)
model = tf.keras.models.Sequential() # (0)
model.add( tf.keras.layers.Flatten(input_shape=(28, 28)) ) # (1)
model.add( tf.keras.layers.Dense(128, activation=tf.nn.relu) ) # (2)
model.add( tf.keras.layers.Dropout(0.2) ) # (3)
model.add( tf.keras.layers.Dense(10, activation=tf.nn.softmax) ) # (4)
You can get an overview of the NN model with the layers set above with summary ()
.
Model overview confirmation
model.summary()
Execution result
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten (Flatten) (None, 784) 0
_________________________________________________________________
dense (Dense) (None, 128) 100480
_________________________________________________________________
dropout (Dropout) (None, 128) 0
_________________________________________________________________
dense_1 (Dense) (None, 10) 1290
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________
From top to bottom, the table is ** input layer **, ** intermediate layer (hidden layer) **, ..., ** output layer **.
The leftmost value in the table is "Layer Name". It is automatically assigned if name =
is omitted in ʻadd (), and is numbered like
flatten_1,
flatten_2` every time the model is built.
The value of ()
, which is the second from the left, is the "layer type". There are three types here: Flatten
, Dense
, and Dropout
. This commentary is in the next section.
The second numerical value of the tuple of the item "Output Shape" is ** the number of neurons in the relevant layer (= the number of outputs from the relevant layer) **. If it is (None, 128)
, it means that there are 128 neurons (nodes) in that layer.
Next, the item "Param" is the total number of ** parameters ** (** weights ** and ** bias ** related to the input of the layer).
For example, $ 100480 $ of the second layer "dense (Dense)" is a parameter of ** weight ** equal to the number of outputs of the first layer $ 784 $ and the number of nodes of the second layer $ 128 $, the second layer. The total number of parameters including the ** bias ** of $ 128 $ nodes. That is, $ 784 \ times 128 + 128 = 100480 $. Training (training / learning) is the operation for finding the optimum values of these parameters.
At the end of the table is the number of Total params (total parameters), Trainable params (parameters required by training), and Non-trainable params (parameters not required by training).
One image of handwritten digit characters is 28 $ \ times $ 28 pixel and is of type (28,28) numpy.ndarray, a two-dimensional array. In the Flatten layer, this is ** flattened ** and fixed in a one-dimensional array. Therefore, the number of outputs is $ 28 \ times 28 = 784 $ as confirmed by model.summary ()
.
The program adds a Flatten layer to the model as follows:
model.add( tf.keras.layers.Flatten(input_shape=(28, 28)) ) # (1)
In the ʻinput_shape argument,
(28, 28) is specified to match
x_train [*] .shape. If you want to input an image of 32 $ \ times $ 32 pixel, use ʻinput_shape = (32, 32)
. The reference is here.
It means a ** fully connected layer ** that is fully bonded (tightly bonded) between the previous layer and the layer concerned. It is the standard layer that makes up a neural network.
The program adds a Dense layer to the model as follows:
Fully connected layer as an intermediate layer
model.add( tf.keras.layers.Dense(128, activation=tf.nn.relu) ) # (2)
Fully connected layer as output layer
model.add( tf.keras.layers.Dense(10, activation=tf.nn.softmax) ) # (4)
The first argument is the number of nodes (number of neurons) that make up the layer. As in (2) above, ** How many nodes should be set in the fully connected layer as the intermediate layer? ** is a factor that affects the performance of the model (it is a hyperparameter set by the user by trial and error). Note that a large number of nodes does not mean that the model will be a high-performance model (at least, as the number of nodes increases, the number of parameters increases, so the amount of calculation increases and learning takes time).
On the other hand, if you are dealing with a multi-class classification problem, ** the number of nodes in the fully connected layer set as the output layer must match the number of classes you want to classify **. In the case of MNIST, it is a classification of numbers from 0 to 9, that is, ** 10 classification problem **, so you need to set 10
here.
Also, give ** activation function ** to the ʻactivation argument. The ** ReLU function ** (
tf.nn.relu) and ** SoftMax function ** (tf.nn.softmax) are used here, the details of which are explained in the next section. If ʻactivation =
is omitted, the activation function is not applied and the calculated value is output as it is). The reference is here.
When training the model **, it works to block the output from the previous layer to the next layer according to the probability specified (on a node-by-node basis) (the corresponding node of the previous layer is inactive according to the probability). / It is also expressed as dropping). By providing this layer, it seems that it is difficult to get into the situation of ** overfitting **.
Regarding this, the explanation of "[Neural network] Dropout is summarized" was very easy to understand.
The program adds a Dropout layer to the model as follows:
model.add( tf.keras.layers.Dropout(0.2) ) # (3)
In the argument, specify the percentage of nodes that you want to deactivate in the range 0.0 to 1.0. Setting this to 0.0 is essentially the same as having no Dropout layer. Also, setting it to 1.0 completely shuts down the network at the Dropout layer, so no learning works (actually, ValueError: rate must be a scalar tensor or a float in the range [0, 1), I get the error got 1
).
Note that the nodes to be inactivated will be selected ** randomly ** according to the specified probability. Therefore, if you have this Dropout layer, ** the trained model will change (slightly) with each training **. Therefore, when investigating the influence of other hyperparameters such as the relationship between the number of nodes in the Dense layer and the correct answer rate, give an argument like seed = 1
and fix the random seed (however, training is better). If there is a random element in, even if it is fixed here, the trained model generated will change for each execution).
The reference is here.
Prepare a model in which the parameter of the Dropout layer (the rate of nodes to be inactivated) is changed from 0.0 to 0.8 in 0.2 increments. Is it effective against overfitting by training and evaluating with Epochs number = 100 and adding a Dropout layer? I observed **.
For each training Epoch, the correct answer rate (accuracy) and loss function value (loss) for the training data x_train
, and the correct answer rate (val_accuracy) and loss function value (val_loss) for the test data x_test
were acquired and plotted. ..
python
mport numpy as np
import tensorflow as tf
# (1)Download handwritten digit image dataset
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# (2)Data normalization
x_train, x_test = x_train / 255.0, x_test / 255.0
# (3)Build NN model
#■■ Dropout rate to 0.0 to 0.Change up to 8 ■■
epochs = 100
results = list()
for p in np.arange(0.0, 1.0, 0.2) :
print(f'■ Dropout p={p:.1f}')
tf.keras.backend.clear_session()
model = tf.keras.models.Sequential()
model.add( tf.keras.layers.Flatten(input_shape=(28, 28)) )
model.add( tf.keras.layers.Dense(128, activation=tf.nn.relu) )
model.add( tf.keras.layers.Dropout(p) ) #See the effect of parameter p here
model.add( tf.keras.layers.Dense(10, activation=tf.nn.softmax) )
# (4)Compiling the model
model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])
# (5)Model training (learning / training)
r = model.fit(x_train, y_train, validation_data=(x_test,y_test), epochs=epochs)
print(r.history)
results.append( dict( rate=p, hist=r.history ) )
#■■ Graph output ■■
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as tk
ylim = dict( )
ylim['accuracy'] = (0.90, 1.00)
ylim['val_accuracy'] = (0.95, 1.00)
ylim['loss'] = (0.00, 0.20)
ylim['val_loss'] = (0.05, 0.30)
xt_style = lambda x, pos=None : f'{x:.0f}'
for v in ['accuracy','loss','val_accuracy','val_loss'] :
plt.figure(dpi=96)
for r in results :
plt.plot( range(1,epochs+1),r['hist'][v],label=f"rate={r['rate']:.1f}")
plt.xlim(1,epochs)
plt.ylim( *(ylim[v]) )
plt.gca().xaxis.set_major_formatter(tk.FuncFormatter(xt_style))
plt.tick_params(direction='in')
plt.legend(bbox_to_anchor=(1.02, 1), loc='upper left', borderaxespad=0)
plt.xlabel('epoch')
plt.ylabel(v)
plt.show()
In either model, the more you learn, the better the value will be. In particular, at rate = 0.0, which is virtually equivalent to the Dropout layer pear, the highest score of 1.0 is reached. Basically, the smaller the rate of inactivation, the faster the learning and the better the final correct answer rate.
Basically, the tendency is similar to the accuracy rate.
From here, it becomes a true evaluation index including generalization performance.
rates = 0.2 and 0.4 converge quickly and the final values are good. On the other hand, 0.6 is clearly inferior to 0.2 and 0.4. 0.0 is a little less stable than the others.
As far as the correct answer rate is concerned, we cannot read the tendency of overfitting.
At rate = 0.0 (equivalent to no Dropout layer), the value gradually deteriorates after Epoch = 8. You can clearly see the tendency of overfitting.
By observing the slope of each data after Epoch = 20, it can be seen that the larger the inactivity rate (rate), the more difficult it is to overfit. We were able to confirm the effectiveness of the Dropout layer.
Comprehensive evaluation confirmed that the values of rate = 0.2 and Epochs = 5 set in the tutorial were good parameters that were well tuned.
If the activation function is ** not applied **, the output $ y_1 $ of the first node in the second layer is calculated as follows ($ x_i $ is the output of the $ i $ th node in the previous layer, $ w_ {i1} $ is the weight, $ b_ {1} $ is the bias).
On the other hand, when the activation function $ f $ is applied, its $ y_1 $ becomes:
There are various activation functions used in neural networks, but they are roughly divided into ** commonly used in the middle layer ** and ** used in the output layer ** depending on the problem type. I will.
In the middle layer, ** ReLU function ** and ** sigmoid function ** are used. Also, due to its nature, the output layer of multiclass classification problems uses the ** SoftnMax function **, and the two-class classification problem uses the ** sigmoid function **.
** It seems to be the most commonly used activation function in the middle layer **. If the input is less than 0, the output is 0, and if the input is 0 or more, the input is output as it is. tf.nn.relu ()
. The reference is here.
ReLU function
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
xmin, xmax = -10,10
x = np.linspace(xmin, xmax,1000)
y = tf.nn.relu(x) #ReLU function
#Check the shape on the graph
plt.figure(dpi=96)
plt.plot(x,y,lw=3)
plt.xlim(xmin, xmax)
plt.ylim(-1, 12)
plt.hlines([0],*(plt.xlim()),ls='--',lw=0.5)
plt.vlines([0],*(plt.ylim()),ls='--',lw=0.5)
It seems that there are "Leaky Relu function" and "Parametric ReLU function" as variants of the ReLU function.
--Reference: About the activation function ReLU and the ReLU clan
One of the activation functions often used in the intermediate layer. However, if the sigmoid function is used as the activation function in the ** NN model with a large number of layers **, the ReLU function seems to be deprived of its popularity due to the vanishing gradient problem. tf.math.sigmoid ()
. The reference is here.
python
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
xmin, xmax = -10,10
x = np.linspace(xmin, xmax,1000)
y = tf.math.sigmoid(x) #Sigmoid function
#Check the shape on the graph
plt.figure(dpi=96)
plt.plot(x,y,lw=3)
plt.xlim(xmin, xmax)
plt.ylim(-0.2, 1.2)
plt.hlines([0,0.5,1],*(plt.xlim()),ls='--',lw=0.5)
plt.vlines([0],*(plt.ylim()),ls='--',lw=0.5)
Commonly used in the ** output layer ** of multiclass classification problems. Regardless of the input, the output ranges from 0.0 to 1.0, and the total output is 1.0. tf.nn.softmax ()
. The reference is here.
For example, if you apply the SoftMax function to the input [2, -1, 1, 1]
as follows, you will get an output such as [0.56, 0.03, 0.21, 0.21]
(sum of elements is 1.0). You can get it.
ReLU function
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patheffects as pe
import tensorflow as tf
x = np.array( [2, -1, 1, 1], dtype=np.float64 )
fx = tf.nn.softmax(x)
fx = fx.numpy() # 'numpy.ndarray'Get content with
np.set_printoptions(precision=2)
print(f'fx = {fx}')
print(f'fx.sum() = {fx.sum():.2f}')
fig, ax = plt.subplots(nrows=2, ncols=1, dpi=96)
plt.subplots_adjust(hspace=0.12)
ep = (pe.Stroke(linewidth=3, foreground='white'),pe.Normal())
tp = dict(horizontalalignment='center',verticalalignment='center')
ax[0].bar( np.arange(0,len(x)), x, fc='tab:red' )
ax[1].bar( np.arange(0,len(fx)), fx )
ax[1].set_ylim(0,1)
for i, p in enumerate([x,fx]) :
ax[i].tick_params(axis='x', which='both', bottom=False, labelbottom=False)
ax[i].set_xlim(-0.5,len(p)-0.5)
for j, v in enumerate(p):
t = ax[i].text(j, v/2, f'{v:.2f}',**tp)
t.set_path_effects(ep)
ax[0].hlines([0],*(plt.xlim()),lw=1)
ax[0].set_ylabel('x')
ax[1].set_ylabel('SoftMax(x)')
Undecided
Recommended Posts