This time, I have created an ESPCN (efficient sub-pixel convolutional neural network), which is one of the super-resolution methods, so I will post it as a summary. Click here for the original paper → [Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network](https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Shi_Real-Time_Single_Image_CVPR_2016_paper. html)
1.First of all 2. What is ESPCN? 3. PC environment 4. Code description 5. At the end
Super-resolution is a technology that improves the resolution of low-resolution images and moving images, and ESPCN is a method proposed in 2016. (By the way, SRCNN, which is mentioned as the first deep learning method, was in 2014) SRCNN was a method to improve the resolution by combining it with the existing enlargement method such as bicubic method, but in this ESPCN, the enlargement phase is introduced in the deep learning model, and it can be enlarged at any magnification. I will. This time, I built this method with python, so I would like to introduce the code. The full code is also posted on GitHub, so please check there. https://github.com/nekononekomori/espcn_keras
ESPCN is a method to improve the resolution by introducing Subpixel Convolution (Pixel shuffle) into the deep learning model. Since the main thing is to post the code, I will omit the detailed explanation, but I will post the site that explains ESPCN. https://buildersbox.corp-sansan.com/entry/2019/03/20/110000 https://qiita.com/oki_uta_aiota/items/74c056718e69627859c0 https://qiita.com/jiny2001/items/e2175b52013bf655d617
cpu : intel corei7 8th Gen gpu : NVIDIA GeForce RTX 1080ti os : ubuntu 20.04
As you can see from GitHub, it mainly consists of three codes. ・ Datacreate.py → Data set generation program ・ Model.py → ESPCN program ・ Main.py → Execution program I have created a function with datacreate.py and model.py and executed it with main.py.
datacreate.py
import cv2
import os
import random
import glob
import numpy as np
import tensorflow as tf
#A program that cuts out an arbitrary number of frames
def save_frame(path, #The path of the file that contains the data
data_number, #Number of photos to cut from one image
cut_height, #Storage size(Vertical)(Low image quality)
cut_width, #Storage size(side)(Low image quality)
mag, #Magnification
ext='jpg'):
#Generate a list of datasets
low_data_list = []
high_data_list = []
path = path + "/*"
files = glob.glob(path)
for img in files:
img = cv2.imread(img, cv2.IMREAD_GRAYSCALE)
H, W = img.shape
cut_height_mag = cut_height * mag
cut_width_mag = cut_width * mag
if cut_height_mag > H or cut_width_mag > W:
return
for q in range(data_number):
ram_h = random.randint(0, H - cut_height_mag)
ram_w = random.randint(0, W - cut_width_mag)
cut_img = img[ram_h : ram_h + cut_height_mag, ram_w: ram_w + cut_width_mag]
#Shrinks after blurring with Usian filter
img1 = cv2.GaussianBlur(img, (5, 5), 0)
img2 = img1[ram_h : ram_h + cut_height_mag, ram_w: ram_w + cut_width_mag]
img3 = cv2.resize(img2, (cut_height, cut_width))
high_data_list.append(cut_img)
low_data_list.append(img3)
#numpy → tensor +Normalization
low_data_list = tf.convert_to_tensor(low_data_list, np.float32)
high_data_list = tf.convert_to_tensor(high_data_list, np.float32)
low_data_list /= 255
high_data_list /= 255
return low_data_list, high_data_list
This will be the program that will generate the dataset.
def save_frame(path, #The path of the file that contains the data
data_number, #Number of photos to cut from one image
cut_height, #Storage size(Vertical)(Low resolution)
cut_width, #Storage size(side)(Low resolution)
mag, #Magnification
ext='jpg'):
Here is the definition of the function. As I wrote in the comment out, path is the path of the folder. (For example, if you have a photo in a folder named file, type "./file".) data_number cuts out multiple photos and turns the data. cut_height and cut_wedth are low resolution image sizes. The final output result will be the value multiplied by the magnification mag. If (cut_height = 300, cut_width = 300, mag = 300, The result is an image with a size of 900 * 900. )
path = path + "/*"
files = glob.glob(path)
This is a list of all the photos in the file.
for img in files:
img = cv2.imread(img, cv2.IMREAD_GRAYSCALE)
H, W = img.shape
cut_height_mag = cut_height * mag
cut_width_mag = cut_width * mag
if cut_height_mag > H or cut_width_mag > W:
return
for q in range(data_number):
ram_h = random.randint(0, H - cut_height_mag)
ram_w = random.randint(0, W - cut_width_mag)
cut_img = img[ram_h : ram_h + cut_height_mag, ram_w: ram_w + cut_width_mag]
#Shrink after blurring with Gaussian filter
img1 = cv2.GaussianBlur(img, (5, 5), 0)
img2 = img1[ram_h : ram_h + cut_height_mag, ram_w: ram_w + cut_width_mag]
img3 = cv2.resize(img2, (cut_height, cut_width))
high_data_list.append(cut_img)
low_data_list.append(img3)
Here, I take out the photos listed earlier one by one and cut out as many as the number of data_number. I'm using random.randint because I want to randomly cut the location. Then, it is blurred with a Gaussian filter to generate a low resolution image. Finally, I add it to the list with append.
#numpy → tensor +Normalization
low_data_list = tf.convert_to_tensor(low_data_list, np.float32)
high_data_list = tf.convert_to_tensor(high_data_list, np.float32)
low_data_list /= 255
high_data_list /= 255
return low_data_list, high_data_list
Here, keras and tensorflow need to convert to tensor instead of numpy array, so conversion is done. At the same time, normalization is also done here.
Finally, the function ends with a list containing low-resolution images and a list containing high-resolution images.
main.py
import tensorflow as tf
from tensorflow.python.keras.models import Model
from tensorflow.python.keras.layers import Conv2D, Input, Lambda
def ESPCN(upsampling_scale):
input_shape = Input((None, None, 1))
conv2d_0 = Conv2D(filters = 64,
kernel_size = (5, 5),
padding = "same",
activation = "relu",
)(input_shape)
conv2d_1 = Conv2D(filters = 32,
kernel_size = (3, 3),
padding = "same",
activation = "relu",
)(conv2d_0)
conv2d_2 = Conv2D(filters = upsampling_scale ** 2,
kernel_size = (3, 3),
padding = "same",
)(conv2d_1)
pixel_shuffle = Lambda(lambda z: tf.nn.depth_to_space(z, upsampling_scale))(conv2d_2)
model = Model(inputs = input_shape, outputs = [pixel_shuffle])
model.summary()
return model
As expected, it's short.
By the way, when I look at the ESPCN paper, I write that it has such a structure. Click here for details on the Convolution layer → keras documentation pixel_shuffle is not installed as standard in keras, so I replaced it with lambda. lambbda represents an expansion because you can incorporate any expression into your model. lambda documentation → https://keras.io/ja/layers/core/#lambda tensorflow documentation → https://www.tensorflow.org/api_docs/python/tf/nn/depth_to_space
There seem to be various ways to handle the pixel shuffle here.
model.py
import model
import data_create
import argparse
import os
import cv2
import numpy as np
import tensorflow as tf
if __name__ == "__main__":
def psnr(y_true, y_pred):
return tf.image.psnr(y_true, y_pred, 1, name=None)
train_height = 17
train_width = 17
test_height = 200
test_width = 200
mag = 3.0
cut_traindata_num = 10
cut_testdata_num = 1
train_file_path = "../photo_data/DIV2K_train_HR" #Folder with photos
test_file_path = "../photo_data/DIV2K_valid_HR" #Folder with photos
BATSH_SIZE = 256
EPOCHS = 1000
opt = tf.keras.optimizers.Adam(learning_rate=0.0001)
parser = argparse.ArgumentParser()
parser.add_argument('--mode', type=str, default='espcn', help='espcn, evaluate')
args = parser.parse_args()
if args.mode == "espcn":
train_x, train_y = data_create.save_frame(train_file_path, #Path of the image to be cropped
cut_traindata_num, #Number of datasets generated
train_height, #Storage size
train_width,
mag) #magnification
model = model.ESPCN(mag)
model.compile(loss = "mean_squared_error",
optimizer = opt,
metrics = [psnr])
#https://keras.io/ja/getting-started/faq/
model.fit(train_x,
train_y,
epochs = EPOCHS)
model.save("espcn_model.h5")
elif args.mode == "evaluate":
path = "espcn_model"
exp = ".h5"
new_model = tf.keras.models.load_model(path + exp, custom_objects={'psnr':psnr})
new_model.summary()
test_x, test_y = data_create.save_frame(test_file_path, #Path of the image to be cropped
cut_testdata_num, #Number of datasets generated
test_height, #Storage size
test_width,
mag) #magnification
print(len(test_x))
pred = new_model.predict(test_x)
path = "resurt_" + path
os.makedirs(path, exist_ok = True)
path = path + "/"
for i in range(10):
ps = psnr(tf.reshape(test_y[i], [test_height, test_width, 1]), pred[i])
print("psnr:{}".format(ps))
before_res = tf.keras.preprocessing.image.array_to_img(tf.reshape(test_x[i], [int(test_height / mag), int(test_width / mag), 1]))
change_res = tf.keras.preprocessing.image.array_to_img(tf.reshape(test_y[i], [test_height, test_width, 1]))
y_pred = tf.keras.preprocessing.image.array_to_img(pred[i])
before_res.save(path + "low_" + str(i) + ".jpg ")
change_res.save(path + "high_" + str(i) + ".jpg ")
y_pred.save(path + "pred_" + str(i) + ".jpg ")
else:
raise Exception("Unknow --mode")
The main is quite long, but my impression is that if I can shorten it, I can do more. Below, I will explain the contents.
import model
import data_create
import argparse
import os
import cv2
import numpy as np
import tensorflow as tf
Here we are loading a function or another file in the same directory. datacreate.py, model.py and main.py should be in the same directory.
def psnr(y_true, y_pred):
return tf.image.psnr(y_true, y_pred, 1, name=None)
This time, I used psnr as a criterion for judging the quality of the generated image, so that is the definition. psnr is called the peak signal-to-noise ratio, and in simple terms it is like calculating the difference between the pixel values of the images you want to compare. I will omit the detailed explanation here, but this article is relatively detailed, and multiple evaluation methods are described.
train_height = 17
train_width = 17
test_height = 200
test_width = 200
mag = 3.0
cut_traindata_num = 10
cut_testdata_num = 1
train_file_path = "../photo_data/DIV2K_train_HR" #Folder with photos
test_file_path = "../photo_data/DIV2K_valid_HR" #Folder with photos
BATSH_SIZE = 256
EPOCHS = 1000
opt = tf.keras.optimizers.Adam(learning_rate=0.0001)
Here, the value used this time is set. If you are looking at github separately as config.py, it's fine, but since it is not a large-scale program, it is summarized.
As for the size of the training data, 17 * 17, which is the value divided by the mag, was adopted because the paper stated that the train data was 51 * 51. The test is just oversized for easy viewing. __ The result is three times as large as this. __ The number of data is 10 times the number of images contained in the file. (If 800 sheets, the number of data is 8,000)
This time, I used DIV2K Dataset, which is often used for super-resolution, for the data. Since the quality of the data is good, it is said that a certain amount of accuracy can be obtained with a small amount of data.
parser = argparse.ArgumentParser()
parser.add_argument('--mode', type=str, default='espcn', help='espcn, evaluate')
args = parser.parse_args()
I wanted to separate the learning and evaluation of the model here, so I made it like this so that I can select it with --mode. I will not explain in detail, so I will post the official python documentation. https://docs.python.org/ja/3/library/argparse.html
if args.mode == "espcn":
train_x, train_y = data_create.save_frame(train_file_path, #Path of the image to be cropped
cut_traindata_num, #Number of datasets generated
train_height, #Storage size
train_width,
mag) #magnification
model = model.ESPCN(mag)
model.compile(loss = "mean_squared_error",
optimizer = opt,
metrics = [psnr])
#https://keras.io/ja/getting-started/faq/
model.fit(train_x,
train_y,
epochs = EPOCHS)
model.save("espcn_model.h5")
I am learning here. If you select srcnn (the method will be described later), this program will work.
In data_create.save_frame, the function called save_frame of data_create.py is read and made available. Now that the data is in train_x and train_y, load the model in the same way and compile and fit.
See keras documentation for more information on compile and more. We use the same papers as the papers.
Finally, save the model and you're done.
elif args.mode == "evaluate":
path = "espcn_model"
exp = ".h5"
new_model = tf.keras.models.load_model(path + exp, custom_objects={'psnr':psnr})
new_model.summary()
test_x, test_y = data_create.save_frame(test_file_path, #Path of the image to be cropped
cut_testdata_num, #Number of datasets generated
test_height, #Storage size
test_width,
mag) #magnification
print(len(test_x))
pred = new_model.predict(test_x)
path = "resurt_" + path
os.makedirs(path, exist_ok = True)
path = path + "/"
for i in range(10):
ps = psnr(tf.reshape(test_y[i], [test_height, test_width, 1]), pred[i])
print("psnr:{}".format(ps))
before_res = tf.keras.preprocessing.image.array_to_img(tf.reshape(test_x[i], [int(test_height / mag), int(test_width / mag), 1]))
change_res = tf.keras.preprocessing.image.array_to_img(tf.reshape(test_y[i], [test_height, test_width, 1]))
y_pred = tf.keras.preprocessing.image.array_to_img(pred[i])
before_res.save(path + "low_" + str(i) + ".jpg ")
change_res.save(path + "high_" + str(i) + ".jpg ")
y_pred.save(path + "pred_" + str(i) + ".jpg ")
else:
raise Exception("Unknow --mode")
It is finally the explanation of the last. First, load the model you saved earlier so that you can use psnr. Next, generate a dataset for test and generate an image with predict.
I wanted to know the psnr value on the spot, so I calculated it. I wanted to save the image, so I converted it from a tensor to a numpy array, saved it, and finally it's done!
The resolution has been increased firmly like this.
This time I tried to build ESPCN. I'm worried about which paper to implement next. We are always looking forward to your requests and questions. Thank you for reading.
Recommended Posts