CNNs using convolution are relatively good at extracting features in black and white 2D images. A black-and-white image is a QR code, so let's see if the value of this QR code can be read by CNN. Actually, which bit in black and white is which value can be read on a rule basis, and NN without convolution is sufficient, but here I dare to use CNN.
The QR code version depends on the size of the QR code and the number of characters it can contain. For example, version = 1 is 21x21 in size and can contain "www.wikipedia.org" and a 17-character string as shown below. Since E1 to E7 are error corrections, reading is not essential. In other words, in this case, each character is 8 bits, so if you check the value of 136 bits in total, you can read what is written in this size even on the rule base. For the time being, the purpose is to read the numbers written on this minimum QR code 21x21.
I wrote the following in Keras. I converted the 6-digit number into a character string and created 40,000 QR codes with the smallest size to use as learning and test data. Also, with the original CNN, it may be necessary to use pooling to halve the image size, but in this case the input size is as small as 21 × 21, so conv2d alone constitutes the convolution part.
qr.py
import qrcode
import numpy as np
import random
from keras.utils import np_utils
from keras.layers import Input, Conv2D, MaxPooling2D, AveragePooling2D, BatchNormalization, Concatenate
from keras.models import Model
batch_size = 128
num_classes = 10
epochs = 30
X, Y = [], []
sample_list = random.sample(range(10**6), k=40000)
for i in sample_list:
qr = qrcode.QRCode(
version=1,
error_correction=qrcode.constants.ERROR_CORRECT_H,
box_size=1, border=0 )
qr.add_data('%06d' % (i))
qr.make()
img = qr.make_image()
X.append(np.asarray(img))
Y.append([int(d) for d in format(i, '06d')])
X = np.reshape(np.asarray(X),(-1,21,21,1))/1.0
Y = np.reshape(np_utils.to_categorical(np.asarray(Y)), (-1,1,6,10))
print(X.shape)
print(Y.shape)
inputs = Input((21,21,1))
x = Conv2D(256, (3,3), padding='same', activation='relu')(inputs)
x = BatchNormalization()(x)
x = Conv2D(256, (3,3), padding='same', activation='relu')(x)
x = BatchNormalization()(x)
x = Conv2D(256, (3,3), padding='same', activation='relu')(x)
x = BatchNormalization()(x)
x = Conv2D(256, (3,3), padding='same', activation='relu')(x)
x = BatchNormalization()(x)
x = Conv2D(256, (3,3), padding='same', activation='relu')(x)
x = BatchNormalization()(x)
x = Conv2D(512, (3,3), padding='same', activation='relu')(x)
x = BatchNormalization()(x)
x = Conv2D(512, (3,3), padding='same', activation='relu')(x)
x = MaxPooling2D(pool_size=(21, 21))(x)
y = [Conv2D(10, (1,1), activation='softmax')(x) for i in range(6)]
y = Concatenate(axis=-2)(y)
model = Model(inputs=inputs, outputs=y)
model.summary()
model.compile(loss='categorical_crossentropy',
optimizer='Adam',
metrics=['accuracy'])
history = model.fit(X[:30000], Y[:30000], batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(X[30000:], Y[30000:]))
model.save('qr_model.h5', include_optimizer=False)
here
qr.py
y = [Conv2D(10, (1,1), activation='softmax')(x) for i in range(6)]
y = Concatenate(axis=-2)(y)
You may write the part as follows.
qr.py
y1 = Conv2D(10, (1,1), activation='softmax')(x)
y2 = Conv2D(10, (1,1), activation='softmax')(x)
y3 = Conv2D(10, (1,1), activation='softmax')(x)
y4 = Conv2D(10, (1,1), activation='softmax')(x)
y5 = Conv2D(10, (1,1), activation='softmax')(x)
y6 = Conv2D(10, (1,1), activation='softmax')(x)
y = Concatenate(axis=-2)([y1,y2,y3,y4,y5,y6])
The code execution result at this time is as follows.
(40000, 21, 21, 1)
(40000, 1, 6, 10)
...
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 21, 21, 1) 0
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 21, 21, 256) 2560 input_1[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 21, 21, 256) 1024 conv2d_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 21, 21, 256) 590080 batch_normalization_1[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 21, 21, 256) 1024 conv2d_2[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, 21, 21, 256) 590080 batch_normalization_2[0][0]
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 21, 21, 256) 1024 conv2d_3[0][0]
__________________________________________________________________________________________________
conv2d_4 (Conv2D) (None, 21, 21, 256) 590080 batch_normalization_3[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 21, 21, 256) 1024 conv2d_4[0][0]
__________________________________________________________________________________________________
conv2d_5 (Conv2D) (None, 21, 21, 256) 590080 batch_normalization_4[0][0]
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 21, 21, 256) 1024 conv2d_5[0][0]
__________________________________________________________________________________________________
conv2d_6 (Conv2D) (None, 21, 21, 512) 1180160 batch_normalization_5[0][0]
__________________________________________________________________________________________________
batch_normalization_6 (BatchNor (None, 21, 21, 512) 2048 conv2d_6[0][0]
__________________________________________________________________________________________________
conv2d_7 (Conv2D) (None, 21, 21, 512) 2359808 batch_normalization_6[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D) (None, 1, 1, 512) 0 conv2d_7[0][0]
__________________________________________________________________________________________________
conv2d_8 (Conv2D) (None, 1, 1, 10) 5130 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
conv2d_9 (Conv2D) (None, 1, 1, 10) 5130 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
conv2d_10 (Conv2D) (None, 1, 1, 10) 5130 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
conv2d_11 (Conv2D) (None, 1, 1, 10) 5130 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
conv2d_12 (Conv2D) (None, 1, 1, 10) 5130 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
conv2d_13 (Conv2D) (None, 1, 1, 10) 5130 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 1, 6, 10) 0 conv2d_8[0][0]
conv2d_9[0][0]
conv2d_10[0][0]
conv2d_11[0][0]
conv2d_12[0][0]
conv2d_13[0][0]
==================================================================================================
Total params: 5,940,796
Trainable params: 5,937,212
Non-trainable params: 3,584
__________________________________________________________________________________________________
Train on 30000 samples, validate on 10000 samples
Epoch 1/30
30000/30000 [==============================] - 66s 2ms/step - loss: 2.7801 - acc: 0.1714 - val_loss: 2.3467 - val_acc: 0.2484
Epoch 2/30
30000/30000 [==============================] - 62s 2ms/step - loss: 1.8426 - acc: 0.3493 - val_loss: 1.6885 - val_acc: 0.3941
Epoch 3/30
30000/30000 [==============================] - 64s 2ms/step - loss: 1.4841 - acc: 0.4555 - val_loss: 1.4549 - val_acc: 0.4547
...
Epoch 28/30
30000/30000 [==============================] - 64s 2ms/step - loss: 0.0401 - acc: 0.9868 - val_loss: 0.3695 - val_acc: 0.9110
Epoch 29/30
30000/30000 [==============================] - 64s 2ms/step - loss: 0.0435 - acc: 0.9853 - val_loss: 0.3403 - val_acc: 0.9184
Epoch 30/30
30000/30000 [==============================] - 63s 2ms/step - loss: 0.0339 - acc: 0.9889 - val_loss: 0.3164 - val_acc: 0.9231
The final precision is ** acc: 0.9889, val_acc: 0.9231 **. The epoch and model are suitable, so adjusting them will improve the accuracy a little more.
Write the following as a verification code. The result is a fairly small limited condition of 21x21 with QR code version = 1, but I was able to predict a 6-digit number to some extent.
qr2.py
import qrcode
import numpy as np
import random
from keras.models import load_model
X, Y = [], []
sample_list = random.sample(range(10**6), k=10)
for i in sample_list:
qr = qrcode.QRCode(
version=1,
error_correction=qrcode.constants.ERROR_CORRECT_H,
box_size=1, border=0 )
qr.add_data('%06d' % (i))
qr.make()
img = qr.make_image()
X.append(np.asarray(img))
Y.append(i)
X = np.reshape(np.asarray(X),(-1,21,21,1))/1.0
model = load_model('qr_model.h5')
Y_pred = model.predict(X)
Y_pred_list = []
for i in range(10):
Y_pred_value = 0
for j in range(6):
Y_pred_value += np.argmax(Y_pred[i,0,j])
Y_pred_value *= 10
Y_pred_list.append(Y_pred_value//10)
print(Y)
print(Y_pred_list)
[89127, 306184, 427806, 501649, 727976, 232504, 427216, 893062, 127368, 100207]
[89127, 306184, 427806, 501649, 727976, 234506, 431222, 893062, 127378, 100207]
In general, MaxPooling2D knows if there is a feature in the image, but that location is not transmitted to subsequent layers. Speaking of CNN, it is often used for classification problems such as dogs and cats being included somewhere in the image, but on the other hand, I thought that it was possible to extract features (without using flatten) to specify the location, but it was possible. And it seems. This may be because the data value is extracted for short-distance convolution, and the feature amount of the data location is extracted by the distance from the square fixed pattern in the upper left, upper right, and lower left for long-distance convolution. If you just read the data value (8 bits of 2,4 or 4,2), Conv2d with 2 layers (equivalent to 5,5) should be enough, but in reality, the accuracy is not so high with the model with Conv2d with 2 layers. It was (val_acc: about 0.5). Therefore, if you read the QR code on CNN, it may be necessary to extract the features of the place by long-distance convolution. (Or maybe it's better to use flatten as a model quickly ...)
Recommended Posts