This is a study memo (6th) about image classification (Google Colaboratory environment) using TensorFlow2 + Keras. The subject is the classification of handwritten digit images (MNIST), which is a standard item.
--Challenge image classification by TensorFlow2 + Keras series -1. Move for the time being -2. Take a closer look at the input data -3. Visualize MNIST data -4. Let's make a prediction with the trained model -5. Observe images that fail to classify -6. Try preprocessing and classifying images prepared by yourself -7. Understanding layer types and activation functions -8. Select optimization algorithm and loss function -9. Try learning, saving and loading the model
Last time made predictions (classifications) using handwritten digit images prepared in advance by MNIST. This time, I would like to use ** an image I prepared myself ** to classify the model as trained. In addition, I would like to explain the Python program (using the Pillow library) related to ** preprocessing such as resizing and trimming ** required at that time.
I created a handwritten character of "** 8 **" with a size of ** 100 ** $ \ times $ ** 100 ** pixel with Paint and saved it as a color (RGB) PNG file. I named it test-8.png
.
You can upload by activating the File tab in the side menu of Google Colab. And dragging and dropping it from your desktop as follows. The uploaded file will be ** deleted after a certain period of time **.
Also, if you write a code cell and execute it as follows, you can upload it in the same way using the file selection dialog.
The absolute path of the uploaded file (test-8.png
) will be /content/test-8.png
. Also, since the current directory is / content
, you can access it with just test-8.png
.
You can also mount Google Drive and browse for it. For more information, see Google Colaboratory (from first use to reading files) @ Qiita.
The uploaded image file is read and displayed for checking the contents. Images are handled using Pillow (PIL Fork). Only 3 lines.
python
import PIL.Image as Image
img = Image.open('test-8.png')
display(img)
The following ** preprocessing ** is required to fill in the trained model.
You can do the above pre-processing with the following code. It should be noted that a normal 256-step grayscale image ** white is "255" and black is "0" **, so you need to invert it.
python
import numpy as np
import PIL.Image as Image
import matplotlib.pyplot as plt
img = Image.open('test-8.png')
img = img.convert('L') # 1.Convert to grayscale
img = img.resize((28,28)) # 2.Resized to 28x28
x_sample = np.array(img) # 3. numpy.Convert to ndarray type
x_sample = 1.0 - x_sample / 255.0 # 4.Inversion / normalization
y_sample = 8 #Correct answer data
#Confirmation output
print(f'x_sample.type = {type(x_sample)}')
print(f'x_sample.shape = {x_sample.shape}')
plt.figure()
plt.imshow(x_sample,vmin=0.,vmax=1.,cmap='Greys')
plt.show()
The execution result is as follows.
For this x_sample
, make a prediction with the trained model and create a prediction result report with the program shown in 4th. It will be as follows.
I was able to make a good prediction (classification).
Basically, it is the same as the program shown in 4th, but x_sample
is the single input data, y_sample
is the correct answer data, I am rewriting the assumption that the trained model is stored in model
.
matplotlib_Japanese output preparation process
!pip install japanize-matplotlib
import japanize_matplotlib
python
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patheffects as pe
import matplotlib.transforms as ts
s_sample = model.predict(np.array([x_sample]))[0] #Prediction (classification)
fig, ax = plt.subplots(nrows=2,figsize=(3,4.2), dpi=120,
gridspec_kw={'height_ratios': [3, 1]})
plt.subplots_adjust(hspace=0.05)
#Display the image of handwritten numbers on the upper side
ax[0].imshow(x_sample,interpolation='nearest',vmin=0.,vmax=1.,cmap='Greys')
ax[0].tick_params(axis='both', which='both', left=False,
labelleft=False, bottom=False, labelbottom=False)
#Correct answer value and predicted value are displayed in the upper left
t = ax[0].text(0.5, 0.5, f'Correct answer:{y_sample}',
verticalalignment='top', fontsize=9, color='tab:red')
t.set_path_effects([pe.Stroke(linewidth=2, foreground='white'), pe.Normal()])
t = ax[0].text(0.5, 2.5, f'Prediction:{s_sample.argmax()}',
verticalalignment='top', fontsize=9, color='tab:red')
t.set_path_effects([pe.Stroke(linewidth=2, foreground='white'), pe.Normal()])
#Show NN forecast output at the bottom
b = ax[1].bar(np.arange(0,10),s_sample,width=0.95)
b[s_sample.argmax()].set_facecolor('tab:red') #Make the maximum item red
#X-axis setting
ax[1].tick_params(axis='x',bottom=False)
ax[1].set_xticks(np.arange(0,10))
t = ax[1].set_xticklabels(np.arange(0,10),fontsize=11)
t[s_sample.argmax()].set_color('tab:red') #Make the maximum item red
offset = ts.ScaledTranslation(0, 0.03, plt.gcf().dpi_scale_trans)
for label in ax[1].xaxis.get_majorticklabels() :
label.set_transform(label.get_transform() + offset)
#Y-axis setting
ax[1].tick_params(axis='y',direction='in')
ax[1].set_ylim(0,1)
ax[1].set_yticks(np.linspace(0,1,5))
ax[1].set_axisbelow(True)
ax[1].grid(axis='y')
If you prepare an image of handwritten numbers by yourself, there are cases where ** numbers are not located in the center of the image ** as shown below.
If you apply prediction (classification) to such an image as it is, you will get ** terrible results ** as follows.
For this reason, before making a prediction, it is necessary to move the character part to the center and perform preprocessing so that the net character part is about 90% of the size of the figure. there is. In addition, it is necessary to remove ** dirt ** and ** dust ** other than characters.
Here, I would like to do the following (automated) preprocessing.
Preprocessing
import numpy as np
from PIL import Image, ImageChops,ImageFilter, ImageOps, ImageDraw
import matplotlib.pyplot as plt
#Add margins (white) of the specified width to the top, bottom, left, and right of the figure
def add_margin(img, margin):
w, h = img.size
w2 = w + 2 * margin
h2 = h + 2 * margin
result = Image.new('L', (w2, h2), 255)
result.paste(img, (margin, margin))
return result
#The size that fits the long side of the rectangle given by the argument
#Calculate a square (but a little bigger)
def to_square( rect ):
x1, y1, x2, y2 = rect # (x1,y1)Is the upper left, (x2,y2)Is the lower right coordinate
s = max( x2-x1, y2-y1 ) #Get the length of the long side
s = int(s*1.3) #A little bigger
nx1 = (x1+x2)/2 - s/2
nx2 = (x1+x2)/2 + s/2
ny1 = (y1+y2)/2 - s/2
ny2 = (y1+y2)/2 + s/2
return (nx1,ny1,nx2,ny2)
img = Image.open('test-2x.png')
img = img.convert('L')
#display(img)
#Add white margins to the top, bottom, left, and right of the image
img = add_margin(img,int(max(img.size)*0.2))
#display(img)
#Create inverted image
img2 = ImageOps.invert(img)
#Blur
img2 = img2.filter(ImageFilter.GaussianBlur(1.5))
#display(img2)
#Binarization
img2 = img2.point(lambda p: p > 150 and 255)
#display(img2)
#Get the smallest area (rectangle) other than black
rect = img2.getbbox()
# tmp = img2.convert('RGB')
# ImageDraw.Draw(tmp).rectangle(rect, fill=None, outline='red')
# display(tmp)
#Convert a rectangle to a square that fits the long side
sqr = to_square(rect)
# tmp = img2.convert('RGB')
# ImageDraw.Draw(tmp).rectangle(sqr, fill=None, outline='red')
# display(tmp)
#Trimmed with a square
img = img.crop(sqr)
#display(img)
#After that, the same as before
img = img.convert('L') # 1.Convert to grayscale
img = img.resize((28,28)) # 2.Resized to 28x28
x_sample = np.array(img) # 3. numpy.Convert to ndarray type
x_sample = 1.0 - x_sample / 255.0 # 4.Inversion / normalization
y_sample = 2 #Correct answer data
#Confirmation output
print(f'x_sample.type = {type(x_sample)}')
print(f'x_sample.shape = {x_sample.shape}')
plt.figure()
plt.imshow(x_sample,vmin=0.,vmax=1.,cmap='Greys')
plt.show()
This is a comparison of the results of ** predictive classification without preprocessing and ** predictive classification after preprocessing. I realize once again that preprocessing is important before trial and error about the prediction model.
――Since Sotobori has been filled up, I would like to finally study ** model construction ** of neural networks.
Recommended Posts