We are collecting samples of learning / verification data for machine learning.
Image data of "handwritten numbers" from 0 to 9. Used for machine learning such as "identify and classify handwritten numbers with AI". http://yann.lecun.com/exdb/mnist/ You can download it for free from.
When you unzip the gz file, it becomes a binary file like the one below.
t10k-images.idx3-ubyte
Even though it is image data, it is not in a format like .jpg, so It cannot be previewed as it is.
For example, if you write python code and output it to png with numpy or PIL, you can display it as an ordinary image file.
If wget and unzip are not included yet, install them. For ubuntu:
apt-get install -y wget
apt-get install unzip
First, download gz with wget.
wget http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
wget http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Then unzip.
gunzip train-images-idx3-ubyte.gz
gunzip train-labels-idx1-ubyte.gz
Then
-rw-r--r--. 1 root root 47040016 Jul 21 2000 train-images-idx3-ubyte
-rw-r--r--. 1 root root 60008 Jul 21 2000 train-labels-idx1-ubyte
Is output. Then write the python code.
vi test.py
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Flatten, Dropout, MaxPooling2D
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.datasets import mnist
import os
import numpy as np
import matplotlib.pyplot as plt
import struct
from PIL import Image
trainImagesFile = open('./train-images-idx3-ubyte','rb')
trainLabelsFile = open('./train-labels-idx1-ubyte','rb')
f = trainImagesFile
magic_number = f.read( 4 )
magic_number = struct.unpack('>i', magic_number)[0]
number_of_images = f.read( 4 )
number_of_images = struct.unpack('>i', number_of_images)[0]
number_of_rows = f.read( 4 )
number_of_rows = struct.unpack('>i', number_of_rows)[0]
number_of_columns = f.read( 4 )
number_of_columns = struct.unpack('>i', number_of_columns)[0]
bytes_per_image = number_of_rows * number_of_columns
raw_img = f.read(bytes_per_image)
format = '%dB' % bytes_per_image
lin_img = struct.unpack(format, raw_img)
np_ary = np.asarray(lin_img).astype('uint8')
np_ary = np.reshape(np_ary, (28,28),order='C')
pil_img = Image.fromarray(np_ary)
pil_img.save("output.png ")
python test.py
output.png
Image data for learning train-images-idx3-ubyte The structure of is as follows.
http://yann.lecun.com/exdb/mnist/
TRAINING SET IMAGE FILE (train-images-idx3-ubyte):
[offset] [type] [value] [description]
0000 32 bit integer 0x00000803(2051) magic number
0004 32 bit integer 60000 number of images
0008 32 bit integer 28 number of rows
0012 32 bit integer 28 number of columns
0016 unsigned byte ?? pixel
0017 unsigned byte ?? pixel
........
xxxx unsigned byte ?? pixel
According to the above, the offset is read sequentially while shifting by 4.
magic_number = f.read( 4 )
The result is 2051.
number_of_images = f.read( 4 )
The result is 60000.
number_of_rows = f.read( 4 )
The result is 28.
number_of_columns = f.read( 4 )
The result is 28.
If you really want to see the value
print('--------------------')
print('magic_number');
print(magic_number);
print('--------------------')
print('number_of_images');
print(number_of_images);
print('--------------------')
print('number_of_rows');
print(number_of_rows);
print('--------------------')
print('number_of_columns');
print(number_of_columns);
You can check it by outputting as follows.
--------------------
magic_number
2051
--------------------
number_of_images
60000
--------------------
number_of_rows
28
--------------------
number_of_columns
28
And
[offset] [type] [value] [description]
0016 unsigned byte ?? pixel
Since it is, images are included after offset 16
bytes_per_image = number_of_rows * number_of_columns
raw_img = f.read(bytes_per_image)
Can be read as. After that, I thrust it into numpy and save it in png format.
If you rotate the png output process in a loop as shown below, you can continuously image. After that, if you want to output 10 sheets, you can specify the number of loops as you like, like range (10) :.
for num in range(10):
raw_img = f.read(bytes_per_image)
format = '%dB' % bytes_per_image
lin_img = struct.unpack(format, raw_img)
np_ary = np.asarray(lin_img).astype('uint8')
np_ary = np.reshape(np_ary, (28,28),order='C')
pil_img = Image.fromarray(np_ary)
pil_img.save("output" + str(num) + ".png ")
The output result is below.
When np_ary, which is a numpy array, is displayed by print (), the array data is as follows.
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0 3 18 18 18 126 136 175 26 166 255 247 127 0 0 0 0]
[ 0 0 0 0 0 0 0 0 30 36 94 154 170 253 253 253 253 253 225 172 253 242 195 64 0 0 0 0]
[ 0 0 0 0 0 0 0 49 238 253 253 253 253 253 253 253 253 251 93 82 82 56 39 0 0 0 0 0]
[ 0 0 0 0 0 0 0 18 219 253 253 253 253 253 198 182 247 241 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 80 156 107 253 253 205 11 0 43 154 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 14 1 154 253 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 139 253 190 2 0 0 0 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 11 190 253 70 0 0 0 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0 35 241 225 160 108 1 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 81 240 253 253 119 25 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 45 186 253 253 150 27 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 93 252 253 187 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 249 253 249 64 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 46 130 183 253 253 207 2 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0 39 148 229 253 253 253 250 182 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 24 114 221 253 253 253 253 201 78 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 23 66 213 253 253 253 253 198 81 2 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 18 171 219 253 253 253 253 195 80 9 0 0 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 55 172 226 253 253 253 253 244 133 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 136 253 253 253 212 135 132 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
It can be seen that this is the position of each pixel constituting the image file and its color information.
The above implementation was to convert image data (Images) to png, In addition to this, it is also necessary to check the label data (Labels). The implementation for that is as follows.
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Flatten, Dropout, MaxPooling2D
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.datasets import mnist
import os
import numpy as np
import matplotlib.pyplot as plt
import struct
from PIL import Image
trainImagesFile = open('./train-images-idx3-ubyte','rb')
trainLabelsFile = open('./train-labels-idx1-ubyte','rb')
f = trainLabelsFile
magic_number = f.read( 4 )
magic_number = struct.unpack('>i', magic_number)[0]
number_of_images = f.read( 4 )
number_of_images = struct.unpack('>i', number_of_images)[0]
print("--------------------")
print("magic_number")
print(magic_number)
print("--------------------")
print("number_of_image")
print(number_of_images)
print("--------------------")
label_byte = f.read( 1 )
label_int = int.from_bytes(label_byte, byteorder='big')
print(label_int)
--------------------
magic_number
2049
--------------------
number_of_image
60000
--------------------
5
The structure of the label data is as follows.
train-labels-idx1-ubyte
TRAINING SET LABEL FILE (train-labels-idx1-ubyte):
[offset] [type] [value] [description]
0000 32 bit integer 0x00000801(2049) magic number (MSB first)
0004 32 bit integer 60000 number of items
0008 unsigned byte ?? label
0009 unsigned byte ?? label
........
xxxx unsigned byte ?? label
The labels values are 0 to 9.
In other words, if you read offset 8 and later one by one, you can read the label. If you implement it in a loop, follow below.
for num in range(10):
label_byte = f.read( 1 )
label_int = int.from_bytes(label_byte, byteorder='big')
print(label_int)
The output result is below.
5
0
4
1
9
2
1
3
1
4
Compare with the output result of Images.
Each png image correctly indicates "what number is it?" With a label.
Recommended Posts