Introduction

This is a study memo (6th) about image classification (Google Colaboratory environment) using TensorFlow2 + Keras. The subject is the classification of handwritten digit images (MNIST), which is a standard item.

--Challenge image classification by TensorFlow2 + Keras series -1. Move for the time being -2. Take a closer look at the input data -3. Visualize MNIST data -4. Let's make a prediction with the trained model -5. Observe images that fail to classify -6. Try preprocessing and classifying images prepared by yourself -7. Understanding layer types and activation functions -8. Select optimization algorithm and loss function -9. Try learning, saving and loading the model

Last time made predictions (classifications) using handwritten digit images prepared in advance by MNIST. This time, I would like to use ** an image I prepared myself ** to classify the model as trained. In addition, I would like to explain the Python program (using the Pillow library) related to ** preprocessing such as resizing and trimming ** required at that time.

予測.png

前後.png

Creating handwritten numeric images

I created a handwritten character of "** 8 **" with a size of ** 100 ** $ \ times $ ** 100 ** pixel with Paint and saved it as a color (RGB) PNG file. I named it test-8.png.

プロパティ.png

Upload image file to Google Colab.

You can upload by activating the File tab in the side menu of Google Colab. And dragging and dropping it from your desktop as follows. The uploaded file will be ** deleted after a certain period of time **.

ファイルアップロード.png

Also, if you write a code cell and execute it as follows, you can upload it in the same way using the file selection dialog.

ファイルアップロード2.png

The absolute path of the uploaded file (test-8.png) will be /content/test-8.png. Also, since the current directory is / content, you can access it with just test-8.png.

You can also mount Google Drive and browse for it. For more information, see Google Colaboratory (from first use to reading files) @ Qiita.

Reading image files and checking their contents

The uploaded image file is read and displayed for checking the contents. Images are handled using Pillow (PIL Fork). Only 3 lines.

`python`


import PIL.Image as Image
img = Image.open('test-8.png')
display(img)

Convert to a format that can be input to the trained model

The following ** preprocessing ** is required to fill in the trained model.

Make a grayscale image.
Resize to 28 $ \ times $ 28 pixel.
Make a two-dimensional array of type numpy.ndarray.
Make white "0.0" and black "1.0".

You can do the above pre-processing with the following code. It should be noted that a normal 256-step grayscale image ** white is "255" and black is "0" **, so you need to invert it.

`python`


import numpy as np
import PIL.Image as Image
import matplotlib.pyplot as plt

img = Image.open('test-8.png')

img = img.convert('L')             # 1.Convert to grayscale
img = img.resize((28,28))          # 2.Resized to 28x28
x_sample = np.array(img)           # 3. numpy.Convert to ndarray type
x_sample = 1.0 - x_sample / 255.0  # 4.Inversion / normalization
y_sample = 8  #Correct answer data

#Confirmation output
print(f'x_sample.type = {type(x_sample)}')
print(f'x_sample.shape = {x_sample.shape}')
plt.figure()
plt.imshow(x_sample,vmin=0.,vmax=1.,cmap='Greys')
plt.show()

The execution result is as follows.

For this x_sample, make a prediction with the trained model and create a prediction result report with the program shown in 4th. It will be as follows.

I was able to make a good prediction (classification).

Repost: Forecast result report creation program

Basically, it is the same as the program shown in 4th, but x_sample is the single input data, y_sample is the correct answer data, I am rewriting the assumption that the trained model is stored in model.

`matplotlib_Japanese output preparation process`


!pip install japanize-matplotlib
import japanize_matplotlib

`python`


import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patheffects as pe 
import matplotlib.transforms as ts 

s_sample = model.predict(np.array([x_sample]))[0] #Prediction (classification)

fig, ax = plt.subplots(nrows=2,figsize=(3,4.2), dpi=120, 
                       gridspec_kw={'height_ratios': [3, 1]})

plt.subplots_adjust(hspace=0.05)

#Display the image of handwritten numbers on the upper side
ax[0].imshow(x_sample,interpolation='nearest',vmin=0.,vmax=1.,cmap='Greys')
ax[0].tick_params(axis='both', which='both', left=False, 
                  labelleft=False, bottom=False, labelbottom=False)

#Correct answer value and predicted value are displayed in the upper left
t = ax[0].text(0.5, 0.5, f'Correct answer:{y_sample}',
               verticalalignment='top', fontsize=9, color='tab:red')
t.set_path_effects([pe.Stroke(linewidth=2, foreground='white'), pe.Normal()])
t = ax[0].text(0.5, 2.5, f'Prediction:{s_sample.argmax()}', 
               verticalalignment='top', fontsize=9, color='tab:red')
t.set_path_effects([pe.Stroke(linewidth=2, foreground='white'), pe.Normal()])

#Show NN forecast output at the bottom
b = ax[1].bar(np.arange(0,10),s_sample,width=0.95)
b[s_sample.argmax()].set_facecolor('tab:red') #Make the maximum item red

#X-axis setting
ax[1].tick_params(axis='x',bottom=False)
ax[1].set_xticks(np.arange(0,10))
t = ax[1].set_xticklabels(np.arange(0,10),fontsize=11)
t[s_sample.argmax()].set_color('tab:red') #Make the maximum item red

offset = ts.ScaledTranslation(0, 0.03, plt.gcf().dpi_scale_trans)
for label in ax[1].xaxis.get_majorticklabels() :
    label.set_transform(label.get_transform() + offset)

#Y-axis setting
ax[1].tick_params(axis='y',direction='in')
ax[1].set_ylim(0,1)
ax[1].set_yticks(np.linspace(0,1,5))
ax[1].set_axisbelow(True)
ax[1].grid(axis='y')

Pre-processing: When there is no number in the center of the image, it corresponds to when there is dirt

If you prepare an image of handwritten numbers by yourself, there are cases where ** numbers are not located in the center of the image ** as shown below.

If you apply prediction (classification) to such an image as it is, you will get ** terrible results ** as follows.

For this reason, before making a prediction, it is necessary to move the character part to the center and perform preprocessing so that the net character part is about 90% of the size of the figure. there is. In addition, it is necessary to remove ** dirt ** and ** dust ** other than characters.

Here, I would like to do the following (automated) preprocessing. フロー.png

`Preprocessing`


import numpy as np
from PIL import Image, ImageChops,ImageFilter, ImageOps, ImageDraw
import matplotlib.pyplot as plt

#Add margins (white) of the specified width to the top, bottom, left, and right of the figure
def add_margin(img, margin):
    w, h = img.size
    w2 = w + 2 * margin
    h2 = h + 2 * margin
    result = Image.new('L', (w2, h2), 255)
    result.paste(img, (margin, margin))
    return result

#The size that fits the long side of the rectangle given by the argument
#Calculate a square (but a little bigger)
def to_square( rect ):
  x1, y1, x2, y2 = rect   # (x1,y1)Is the upper left, (x2,y2)Is the lower right coordinate
  s = max( x2-x1, y2-y1 ) #Get the length of the long side
  s = int(s*1.3)          #A little bigger
  nx1 = (x1+x2)/2 - s/2
  nx2 = (x1+x2)/2 + s/2
  ny1 = (y1+y2)/2 - s/2
  ny2 = (y1+y2)/2 + s/2
  return (nx1,ny1,nx2,ny2)

img = Image.open('test-2x.png')

img  = img.convert('L')
#display(img)

#Add white margins to the top, bottom, left, and right of the image
img  = add_margin(img,int(max(img.size)*0.2))
#display(img)

#Create inverted image
img2 = ImageOps.invert(img)

#Blur
img2 = img2.filter(ImageFilter.GaussianBlur(1.5))
#display(img2)

#Binarization
img2 = img2.point(lambda p: p > 150 and 255)  
#display(img2)

#Get the smallest area (rectangle) other than black
rect = img2.getbbox() 
# tmp = img2.convert('RGB')
# ImageDraw.Draw(tmp).rectangle(rect, fill=None, outline='red')
# display(tmp)

#Convert a rectangle to a square that fits the long side
sqr = to_square(rect)
# tmp = img2.convert('RGB')
# ImageDraw.Draw(tmp).rectangle(sqr, fill=None, outline='red')
# display(tmp)

#Trimmed with a square
img = img.crop(sqr)
#display(img)

#After that, the same as before
img = img.convert('L')             # 1.Convert to grayscale
img = img.resize((28,28))          # 2.Resized to 28x28
x_sample = np.array(img)           # 3. numpy.Convert to ndarray type
x_sample = 1.0 - x_sample / 255.0  # 4.Inversion / normalization
y_sample = 2  #Correct answer data

#Confirmation output
print(f'x_sample.type = {type(x_sample)}')
print(f'x_sample.shape = {x_sample.shape}')
plt.figure()
plt.imshow(x_sample,vmin=0.,vmax=1.,cmap='Greys')
plt.show()

This is a comparison of the results of ** predictive classification without preprocessing and ** predictive classification after preprocessing. I realize once again that preprocessing is important before trial and error about the prediction model.

前後.png

next time

――Since Sotobori has been filled up, I would like to finally study ** model construction ** of neural networks.

Challenge image classification with TensorFlow2 + Keras 6-Try preprocessing and classifying images prepared by yourself-

Introduction

Creating handwritten numeric images

Upload image file to Google Colab.

Reading image files and checking their contents

python

Convert to a format that can be input to the trained model

python

Repost: Forecast result report creation program

matplotlib_Japanese output preparation process

python

Pre-processing: When there is no number in the center of the image, it corresponds to when there is dirt

Preprocessing

next time

`python`

`python`

`matplotlib_Japanese output preparation process`

`python`

`Preprocessing`