In python's ndarray type, it became necessary to convert (32,32,3) to a 4D tensor (1,32,32,1). The purpose is for machine learning data.
An array of type ndarray such as (1,32,32,1) is called a "four-dimensional tensor". The content of the image that can be read from this 4D tensor is (number of images, height of image, width of image, number of channels of image). The number of channels in the image is 1 for grayscale and 3 for color for rgb. One image is represented by an array of ndarray type (32,32,3), and it can be judged that it is not an image dataset.
Addendum) When I tell a person who specializes in mathematics that it is a 4D tensor, it seems to be conveyed in a different image, but I like the way it says, "If you do not use a 4D tensor dataset, you cannot use it in Keras." I use it a lot (laughs)
I think it's quite difficult to convert the ndarray type as you want. For the time being, I confirmed that the ndarray type can be converted as follows.
import numpy as np
a = np.arange(6)
a = a.reshape(2, 3)
print(a)
#↓ Output result
#[[0 1 2]
# [3 4 5]]
print("===============\n")
a = a.reshape(2,3,1)
print(a)
#↓ Output result
#[[[0]
# [1]
# [2]]
#
# [[3]
# [4]
# [5]]]
print("---------------\n")
a = a.reshape(1,2,3,1)
print(a)
#↓ Output result
#[[[[0]
# [1]
# [2]]
#
# [[3]
# [4]
# [5]]]]
Now it looks like we can put it in the following predict function. y_pred = model.predict(x) If you do not enter the data of (1, 32, 16, 1) in ndarray type for x, an error will occur. An error will occur even with (32, 16, 1).
from PIL import Image
import numpy as np
# 3 *Where 2 is actually 32*Please replace it with 32 or something.
c = np.arange(3 * 2)
c = c.reshape(3, 2)
pilImg = Image.fromarray(np.uint8(c))
# pilImg_1 = pilImg.convert("RGB")
pilImg_1 = pilImg.convert("L")
data = np.array(pilImg_1, dtype='int64')
print(type(data))
print(data)
print(data.shape)
a = data
print("===============\n")
a = a.reshape(3,2,1)
print(a)
print("===============\n")
a = data.reshape(1,3,2,1)
print(a)
It's a bonus. It is used when changing the image of rgb to grayscale and using it. I don't know how much demand there is.
from PIL import Image
import numpy as np
file = "neko.png "
image = Image.open(file)
image = image.convert("RGB")
data_rgb = np.array(image, dtype='int64')
#Because it is rgb(height, width, 3)Will be an array
print(type(data_rgb))
print("data_rgb ... " + str(data_rgb.shape))
pilImg_rgb = Image.fromarray(np.uint8(data_rgb))
pilImg_gray = pilImg_rgb.convert("L")
data_gray = np.array(pilImg_gray, dtype='int64')
#Because it is grayscale(height, width)Will be an array
print(type(data_gray))
print("data_gray ... " + str(data_gray.shape))
#
pilImg_rgb_2 = Image.fromarray(np.uint8(data_gray))
pilImg_rgb_2 = pilImg_rgb_2.convert("RGB")
data_rgb_2 = np.array(pilImg_rgb_2, dtype='int64')
#I converted it to rgb again, so(height, width, 3)Will be an array
print(type(data_rgb_2))
print("data_rgb ... " + str(data_rgb_2.shape))
So, when doing (height, width) ⇔ (height, width, 3), it was an example of doing this. It will be an array of (height, width) instead of (height, width, 1).
P.S. It was badly written. After all, I think the code below is enough.
from PIL import Image
import numpy as np
file = "neko.png "
image = Image.open(file)
image = image.convert("RGB")
data_rgb = np.array(image, dtype='int64')
#Because it is rgb(height, width, 3)Will be an array
print(type(data_rgb))
print("data_rgb ... " + str(data_rgb.shape))
pilImg_rgb = Image.fromarray(np.uint8(data_rgb))
pilImg_gray = pilImg_rgb.convert("L")
data_gray = np.array(pilImg_gray, dtype='int64')
#Because it is grayscale(height, width)Will be an array
print(type(data_gray))
print("data_gray ... " + str(data_gray.shape))
#Because it is grayscale(height, width)Will be an array
print(type(data_gray))
print("data_gray ... " + str(data_gray.shape))
a = data_gray.reshape(1, image.height, image.width, 1)
print(a.shape)
#Execution result
# <class 'numpy.ndarray'>
# data_rgb ... (210, 160, 3)
# <class 'numpy.ndarray'>
# data_gray ... (210, 160)
# (1, 210, 160, 1)
It is the same as (1,32,32,1) because it has an array of (1, 210, 160, 1). Now you can use it when predicting machine learning. However, it seems that you usually use a color image, so the end is 3 instead of 1. If you want to learn letters etc., grayscale is fine, so I think you can use the sample in this article.
Addendum) Now I am doing well and working.
print("img ... " + str(img.shape))
# img ... (1, 32, 32, 3)
print("img ..." + str(img[0].shape))
# img ... (32, 32, 3)
imwrite(img_path, img)
#↑ This is an error
imwrite(img_path, img[0])
#↑ This is a success
Recommended Posts