1. 1. background

When I implemented image recognition using Keras with interest, it was unexpectedly easy, so I decided to use VGG16 with a friend's recommendation to implement a more accurate model. I'm a beginner, so I'll try to find out more. This time, we will use the image of Orin of apples to evaluate whether it can be applied to varieties. It's just a memo.

2. What is VGG16 in the first place? : thinking:

VGG16 is a 16-layer CNN model trained on a large image dataset called "ImageNet". It was announced in 2014. It is one of the famous trained models used in various studies. Other models trained with ImageNet include AlexNet, GoogLeNet, and ResNet. https://www.y-shinno.com/keras-vgg16/

The following is a reference for the comparison with AlexNet, GoogLeNet, and ResNet here.

(Source: http://thunders1028.hatenablog.com/entry/2017/11/01/035609)

The network of Oxford University's VGG team, which finished second in the 2014 ILSVRC. A normal CNN consisting of a convolution layer and a pooling layer, which is a deeper version of AlexNet, with 16 or 19 layers of weight (convolution layer and fully connected layer). They are called VGG16 and VGG19, respectively.

It features a structure in which two to four convolution layers with small filters are stacked in succession, and the size is halved with a pooling layer. It seems that features can be better extracted by convolving multiple smaller filters (= deepening the layer) than by convolving the image at once with a large filter. (I don't know the reason well, but the number of times it passes through the activation function increases, so the expressiveness increases?) [2]

GoogleNet seems to be stronger, but I will try VGG with an emphasis on comprehensibility. (Things that seem difficult will be from the next time onwards)

3. 3. Introduction of VGG16 (using Google Colab)

I will write the code immediately. First of all, import of Keras

`vgg16_fluits.py`


!pip install keras

Next, import the required libraries. VGG16 is included in Keras. The weight is specified in the third line below.

#Import the model and display the summary
import numpy as np
from keras.applications.vgg16 import VGG16, preprocess_input, decode_predictions
model = VGG16(include_top=True, weights='imagenet', input_tensor=None, input_shape=None)
model.summary()

model.summary () result

Model: "vgg16" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_3 (InputLayer) (None, 224, 224, 3) 0 _________________________________________________________________ block1_conv1 (Conv2D) (None, 224, 224, 64) 1792 _________________________________________________________________ block1_conv2 (Conv2D) (None, 224, 224, 64) 36928 _________________________________________________________________ block1_pool (MaxPooling2D) (None, 112, 112, 64) 0 _________________________________________________________________ block2_conv1 (Conv2D) (None, 112, 112, 128) 73856 _________________________________________________________________ block2_conv2 (Conv2D) (None, 112, 112, 128) 147584 _________________________________________________________________ block2_pool (MaxPooling2D) (None, 56, 56, 128) 0 _________________________________________________________________ block3_conv1 (Conv2D) (None, 56, 56, 256) 295168 _________________________________________________________________ block3_conv2 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_conv3 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_pool (MaxPooling2D) (None, 28, 28, 256) 0 _________________________________________________________________ block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160 _________________________________________________________________ block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_pool (MaxPooling2D) (None, 14, 14, 512) 0 _________________________________________________________________ block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_pool (MaxPooling2D) (None, 7, 7, 512) 0 _________________________________________________________________ flatten (Flatten) (None, 25088) 0 _________________________________________________________________ fc1 (Dense) (None, 4096) 102764544 _________________________________________________________________ fc2 (Dense) (None, 4096) 16781312 _________________________________________________________________ predictions (Dense) (None, 1000) 4097000 ================================================================= Total params: 138,357,544 Trainable params: 138,357,544 Non-trainable params: 0 _________________________________________________________________

The image used this time evaluates apples (Orin).

  #Image reading
from PIL import Image
#import glob
url = '/content/drive/My Drive/Colab Notebooks/img'
files=url+"/apple_orin.jpg "
image =Image.open(files)
image=image.convert('RGB')
image=image.resize((224,224))

#Convert the read PIL format image to array
data = np.asarray(image)

#Evaluation
from keras.preprocessing import image

#Increase the number of samples by one to make a four-dimensional tensor
data = np.expand_dims(data, axis=0)
#Output top 5
preds = model.predict(preprocess_input(data))
results = decode_predictions(preds, top=5)[0]
for result in results:
    print(result)

('n07742313', 'Granny_Smith', 0.9861995) ('n02948072', 'candle', 0.0040857443) ('n07747607', 'orange', 0.001778649) ('n03887697', 'paper_towel', 0.0016588464) ('n07693725', 'bagel', 0.0012920648)

It became.

4. result

What is the 1st place "Granny_Smith"?

Granny Smith is a cultivar of apples. Developed in Australia in 1868 by accidental seedlings by Maria Anne Smith, the origin of the name

With that said, the image itself is quite close, so it seems that the accuracy is high. ImageNet may not have data on Orin.

The order, label, and class name information for 1000 ImageNet classes are summarized in the following JSON file. Below is Granny_Smith.

https://storage.googleapis.com/download.tensorflow.org/data/imagenet_class_index.json

Since it is necessary to learn separately in order to perform image recognition to determine the variety, we will do it from the next time onwards.

This time, the purpose was to try it out, so it's OK.

From the next time onward, we will create a model that can be applied to the variety.

5. Consideration

The key points when using the VGG16 model are as follows.

model = VGG16(include_top=True, weights='imagenet', input_tensor=None, input_shape=None)

Details

argument	Description
include_top	Whether to include a fully connected layer that is classified into 1000 classes.
True: Included (Click here to use for the original 1000 classification)
False: Not included (Click here to customize)
weights	Weight type
imagenet: Weights learned using ImageNet
None: Random
input_tensor	Used when inputting a model image
Any image data: use it
None: Not used
input_shape	Specify the shape of the input image
Any shape: use it
None：(224, 224, 3)Is used

Set include_top to False and use VGG16 for feature extraction for fine tuning. (next time)

Reference (what you are trying to do) http://aidiary.hatenablog.com/entry/20170131/1485864665

Image recognition of fruits using VGG16