Last time Create training data using the collected image data. Since it takes a long time to calculate if the image data is passed to the tensor flow as it is, it is converted to the numpy array format to shorten the calculation time.
from PIL import Image
import os, glob
import numpy as np
from sklearn import model_selection
classes = ["monkey", "boar", "crow"]
num_classes = len(classes)
image_size = 50
X = []
Y = []
This time, we will classify monkey
, boar
, and crow
, so we will store the keywords.
The image size is unified to 50x50.
X
and Y
are labels that indicate the image data and whether the image is monkey (0), boar (1), or crow (2), respectively.
for index, classlabel in enumerate(classes):
photos_dir = "./" + classlabel
files = glob.glob(photos_dir + "/*.jpg ")
for i, file in enumerate(files):
if i >= 141: break # monkey,boar,crow Adjust to the minimum number of data for each
image = Image.open(file)
image = image.convert("RGB")
image = image.resize((image_size, image_size))
data = np.asarray(image)
X.append(data)
Y.append(index)
X = np.array(X)
Y = np.array(Y)
glob ()
is a method that can get a list of files by matching wildcard patterns, and the following data is stored in files.
['./monkey\\49757184328.jpg',
'./monkey\\49767449258.jpg',
...
For each image, open the image, convert it to RGB 256 gradation format, and resize it to 50x50. Then convert it to a numpy array format (which seems to be faster than a Python list).
The X
and Y
created in this way contain the following data.
X
(423, 50, 50, 3)Array of
[[[[ 89 92 60]
[ 85 84 52]
[ 91 84 51]
...
[177 178 24]
[142 145 15]
[231 219 35]]
...
Y
423 array
[0 0 ... 1 1 ... 2 2 ...]
Two methods are used to change to a numpy array, such as data = np.asarray (image)
and X = np.array (X)
. The behavior is the same when converting from a list to a numpy array, but the behavior is different when converting from a numpy array to a numpy array.
Reference: https://punhundon-lifeshift.com/array_asarray
Use the train_test_split
method to split X
and Y
into training data and model validation data and save them with the file name" animal.npy ".
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, Y)
xy = (X_train, X_test, y_train, y_test)
np.save("./animal.npy", xy)
X_train
and y_train
are in an array of 317,
X_test
and y_test
are an array of 106.
That is, about 75% of the data of X
and Y
is divided into train, and about 25% of data is divided into test.
Recommended Posts