Super-resolution technology-SRCNN-Implemented (Tensorflow 2.0) Learning phase

Overview

Super-resolution technology is a technology that turns a low-resolution image into a high-resolution image. This time, I implemented SRCNN, which is a kind of super-resolution technology and is relatively easy to implement.

This time is ** SRCNN learning phase **. Next time, it will be SRCNN's Prediction Phase. Actually, I implemented SRGAN in addition to SRCNN, so I will cover that as a series of super-resolution technology.

environment

-Software- Windows 10 Home Anaconda3 64-bit(Python3.7) Spyder -Library- Tensorflow 2.1.0 opencv-python 4.1.2.30 -Hardware- CPU: Intel core i9 9900K GPU: NVIDIA GeForce RTX2080ti RAM: 16GB 3200MHz

reference

site ・ SRCNN paper ・ [Intern CV Report] Exploring the history of super-resolution -2016 edition- ・ [Sparse coding] Advantages of sparse data representation ・ Keras: Super-resolution ・ I tried a simple super-resolution with deep learning ・ [Introduction to PyTorch and Super-Resolution] (https://buildersbox.corp-sansan.com/entry/2019/02/21/110000) ・ [I tried to implement the model SRCNN that makes the image super resolution with pytorch](https://nykergoto.hatenablog.jp/entry/2019/05/18/%E7%94%BB%E5%83%8F % E3% 81% AE% E8% B6% 85% E8% A7% A3% E5% 83% 8F% E5% BA% A6% E5% 8C% 96% E3% 82% 92% E3% 81% 99% E3 % 82% 8B% E3% 83% A2% E3% 83% 87% E3% 83% AB_SRCNN_% E3% 82% 92_pytorch_% E3% 81% A7% E5% AE% 9F% E8% A3% 85% E3% 81 % 97% E3% 81% A6)

program

I will post it on Github. https://github.com/himazin331/Super-resolution-CNN The repository contains a learning phase and a prediction phase.

This time, I used General-100 for the dataset. I also put the dataset in the GitHub repository for demonstration purposes.

Source code

** Please note that the code is dirty ... **

`srcnn_tr.py`


import argparse as arg
import os
import sys

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' #Hide TF message

import tensorflow as tf
import tensorflow.keras.layers as kl
from tensorflow.python.keras import backend as K

import cv2
import numpy as np

import matplotlib.pyplot as plt

# SRCNN
class SRCNN(tf.keras.Model):

    def __init__(self, h, w):
        super(SRCNN, self).__init__()

        self.conv1 = kl.Conv2D(64, 3, padding='same', activation='relu', input_shape=(None, h, w, 3))
        self.conv2 = kl.Conv2D(32, 3, padding='same', activation='relu')
        self.conv3 = kl.Conv2D(3, 3, padding='same', activation='relu')

    def call(self, x):
        
        h1 = self.conv1(x)
        h2 = self.conv2(h1)
        h3 = self.conv3(h2)

        return h3

#Learning
class trainer(object):

    def __init__(self, h, w):
        
        self.model = SRCNN(h, w)
        
        self.model.compile(optimizer=tf.keras.optimizers.Adam(),
                           loss=tf.keras.losses.MeanSquaredError(),
                            metrics=[self.psnr])
        
    def train(self, lr_imgs, hr_imgs, out_path, batch_size, epochs):

        #Learning
        his = self.model.fit(lr_imgs, hr_imgs, batch_size=batch_size, epochs=epochs)

        print("___Training finished\n\n")

        #Save parameters
        print("___Saving parameter...")
        self.model.save_weights(out_path)
        print("___Successfully completed\n\n")

        return his, self.model

    # PSNR(Peak signal-to-noise ratio)
    def psnr(self, h3, hr_imgs):
        
        return -10*K.log(K.mean(K.flatten((h3 - hr_imgs))**2))/np.log(10)

#Data set creation
def create_dataset(data_dir, h, w, mag):

    print("\n___Creating a dataset...")
    
    prc = ['/', '-', '\\', '|']
    cnt = 0

    #Number of image data
    print("Number of image in a directory: {}".format(len(os.listdir(data_dir))))

    lr_imgs = []
    hr_imgs = []

    for c in os.listdir(data_dir):

        d = os.path.join(data_dir, c)

        _, ext = os.path.splitext(c)
        if ext.lower() == '.db':
            continue
        elif ext.lower() != '.bmp':
            continue

        #Read, resize(High resolution image)
        img = cv2.imread(d)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img = cv2.resize(img, (h, w))

        #Low resolution image
        img_low = cv2.resize(img, (int(h/mag), int(w/mag)))
        img_low = cv2.resize(img_low, (h, w))

        lr_imgs.append(img_low)
        hr_imgs.append(img)

        cnt += 1

        print("\rLoading a LR-images and HR-images...{}    ({} / {})".format(prc[cnt%4], cnt, len(os.listdir(data_dir))), end='')

    print("\rLoading a LR-images and HR-images...Done    ({} / {})".format(cnt, len(os.listdir(data_dir))), end='')

    #Normalization
    lr_imgs = tf.convert_to_tensor(lr_imgs, np.float32)
    lr_imgs /= 255
    hr_imgs = tf.convert_to_tensor(hr_imgs, np.float32)
    hr_imgs /= 255
    
    print("\n___Successfully completed\n")

    return lr_imgs, hr_imgs

# PSNR,Loss value graph output
def graph_output(history):
    
    #PSNR graph
    plt.plot(history.history['psnr'])
    plt.title('Model PSNR')
    plt.ylabel('PSNR')
    plt.xlabel('Epoch')
    plt.legend(['Train'], loc='upper left')
    plt.show()  

    #Loss value graph
    plt.plot(history.history['loss'])
    plt.title('Model loss')
    plt.ylabel('Loss')
    plt.xlabel('Epoch')
    plt.legend(['Train'], loc='upper left')
    plt.show()

def main():

    #Command line option creation
    parser = arg.ArgumentParser(description='Super-resolution CNN training')
    parser.add_argument('--data_dir', '-d', type=str, default=None,
                        help='Specifying the image folder path(Error if not specified)')
    parser.add_argument('--out', '-o', type=str,
                        default=os.path.dirname(os.path.abspath(__file__)),
                        help='Specify the save destination of parameters(Default value=./srcnn.h5')
    parser.add_argument('--batch_size', '-b', type=int, default=32,
                        help='Specifying mini-batch size(Default value=32)')
    parser.add_argument('--epoch', '-e', type=int, default=3000,
                        help='Specifying the number of learning(Default value=3000)')
    parser.add_argument('--he', '-he', type=int, default=256,
                        help='Resize height specification(Default value=256)')      
    parser.add_argument('--wi', '-wi', type=int, default=256,
                        help='Specify resizing(Default value=256)')
    parser.add_argument('--mag', '-m', type=int, default=2,
                        help='Specifying the reduction magnification(Default value=2)')                           
    args = parser.parse_args()

    #Image folder path not specified->exception
    if args.data_dir == None:
        print("\nException: Folder not specified.\n")
        sys.exit()
    #When specifying an image folder that does not exist->exception
    if os.path.exists(args.data_dir) != True:
        print("\nException: Folder \"{}\" is not found.\n".format(args.data_dir))
        sys.exit()
    #When 0 is entered in either the width height or the reduction magnification->exception
    if args.he == 0 or args.wi == 0 or args.mag == 0:
        print("\nInvalid value has been entered.\n")
        sys.exit()

    #Create output folder(Do not create if the folder exists)
    os.makedirs(args.out, exist_ok=True)
    out_path = os.path.join(args.out, "srcnn.h5")

    #Setting information output
    print("=== Setting information ===")
    print("# Images folder: {}".format(os.path.abspath(args.data_dir)))
    print("# Output folder: {}".format(out_path))
    print("# Minibatch-size: {}".format(args.batch_size))
    print("# Epoch: {}".format(args.epoch))
    print("")
    print("# Height: {}".format(args.he))
    print("# Width: {}".format(args.wi))
    print("# Magnification: {}".format(args.mag))
    print("===========================\n")

    #Data set creation
    lr_imgs, hr_imgs = create_dataset(args.data_dir, args.he, args.wi, args.mag)
    
    #Start learning
    print("___Start training...")
    Trainer = trainer(args.he, args.wi)
    his, model = Trainer.train(lr_imgs, hr_imgs, out_path=out_path, batch_size=args.batch_size, epochs=args.epoch)

    # PSNR,Loss value graph output
    graph_output(his)

if __name__ == '__main__':
    main()

Execution result

The number of Epoch is 3000 and the mini batch size is 32.

The graph below is a recording of PSNR (Peak Signal to Noise Ratio). Details will be described later. PSNR 30db is the ceiling. .. ..

The graph below is a record of loss values.

Note that these graphs are not saved.

command python arcnn_tr.py -d <folder> -e <number of learning> -b <batch size> (-o <save> -he <height> -wi <width> -m <magnification (integer)>)

Description

I will explain the code.

Network model

The network model is not much different from regular CNN.

`SRCNN class`


# SRCNN
class SRCNN(tf.keras.Model):

    def __init__(self, h, w):
        super(SRCNN, self).__init__()

        self.conv1 = kl.Conv2D(64, 3, padding='same', activation='relu', input_shape=(None, h, w, 3))
        self.conv2 = kl.Conv2D(32, 3, padding='same', activation='relu')
        self.conv3 = kl.Conv2D(3, 3, padding='same', activation='relu')

    def call(self, x):
        
        h1 = self.conv1(x)
        h2 = self.conv2(h1)
        h3 = self.conv3(h2)

        return h3

The difference from common CNN is that the output channel is generally getting bigger and bigger. In the case of SRCNN, the point is to make the output channel smaller and smaller.

The convolution layer is generally three layers.

The first layer performs ** patch extraction and sparse representation in low resolution space **. The second layer ** performs a non-linear mapping to the high resolution space of the representation acquired in the first layer **. The third layer ** reconstructs the high resolution image **. (From [Intern CV Report] Exploring the history of super-resolution -2016 edition-)

** Sparse expression ** (sparse coding) is to prepare a dictionary to express data and express data with as few combinations of elements as possible **. (From [Sparse Coding] Advantages of Sparse Data Representation)

To put it a little more simply, the sparse expression is how realistic it can be (approximate accuracy) by combining a small number of feature maps with respect to the input image. Approximation accuracy tends to improve when many feature maps are combined, but sparse representation does not dare to do this, and meaningful representation can be extracted by using a small number of elements. In other words, ** clarify which elements are useful and how useful to represent the data **.

Data set creation

** You only need high resolution images ** for the data you need. Low resolution images are created from high resolution images.

`create_dataset function`


#Data set creation
def create_dataset(data_dir, h, w, mag):

    print("\n___Creating a dataset...")
    
    prc = ['/', '-', '\\', '|']
    cnt = 0

    #Number of image data
    print("Number of image in a directory: {}".format(len(os.listdir(data_dir))))

    lr_imgs = []
    hr_imgs = []

    for c in os.listdir(data_dir):

        d = os.path.join(data_dir, c)

        _, ext = os.path.splitext(c)
        if ext.lower() == '.db':
            continue
        elif ext.lower() != '.bmp':
            continue

        #Read, resize(High resolution image)
        img = cv2.imread(d)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img = cv2.resize(img, (h, w))

        #Low resolution image
        img_low = cv2.resize(img, (int(h/mag), int(w/mag)))
        img_low = cv2.resize(img_low, (h, w))

        lr_imgs.append(img_low)
        hr_imgs.append(img)

        cnt += 1

        print("\rLoading a LR-images and HR-images...{}    ({} / {})".format(prc[cnt%4], cnt, len(os.listdir(data_dir))), end='')

    print("\rLoading a LR-images and HR-images...Done    ({} / {})".format(cnt, len(os.listdir(data_dir))), end='')

    #Normalization
    lr_imgs = tf.convert_to_tensor(lr_imgs, np.float32)
    lr_imgs /= 255
    hr_imgs = tf.convert_to_tensor(hr_imgs, np.float32)
    hr_imgs /= 255
    
    print("\n___Successfully completed\n")

    return lr_imgs, hr_imgs

First, load the image. When reading with OpenCV, the pixel arrangement is BGR, so Convert to RGB with cv2.cvtColor (img, cv2.COLOR_BGR2RGB). After that, it will be resized to the specified size. Now the high resolution image is ready for the time being.

Then create a low resolution image. Reduces to the width and height divided by the specified reduction magnification. After that, resize it to the size before it was reduced to complete the creation.

        #Read, resize(High resolution image)
        img = cv2.imread(d)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img = cv2.resize(img, (h, w))

        #Low resolution image
        img_low = cv2.resize(img, (int(h/mag), int(w/mag)))
        img_low = cv2.resize(img_low, (h, w))

It looks like this using a diagram.

By default, OpenCV's cv2.resize () interpolation algorithm uses Bilinear.

Learning

Set up and learn before machine learning in the trainer class. ~~ I don't think I often say "trainer" in Tensorflow, but I'm a person who started with Chainer ... ~~

First, let's talk about instance methods.

`trainer class(Instance method)`


#Learning
class trainer(object):

    def __init__(self, h, w):
        
        self.model = SRCNN(h, w)
        
        self.model.compile(optimizer=tf.keras.optimizers.Adam(),
                           loss=tf.keras.losses.MeanSquaredError(),
                            metrics=[self.psnr])

Call the instance method __init__ when instantiating to determine the network model construction and optimization algorithm. The height and width information is passed to the instance method of the SRCNN class with self.model = SRCNN (h, w). Once the model is built, set the optimization algorithm and loss function in the model. This time, the optimization algorithm is Adam **. Use the mean square error ** for the ** loss function.

The logic is that ** the low-resolution image gradually approaches the high-resolution image ** by finding the mean square error ** between the low-resolution image and the high-resolution image and reducing the value.

In metrics = [self.psnr], PSNR is set in the evaluation function. Details will be described later.

Next is the explanation of the train method.

`trainer class(train method)`


    def train(self, lr_imgs, hr_imgs, out_path, batch_size, epochs):

        #Learning
        his = self.model.fit(lr_imgs, hr_imgs, batch_size=batch_size, epochs=epochs)

        print("___Training finished\n\n")

        #Save parameters
        print("___Saving parameter...")
        self.model.save_weights(out_path)
        print("___Successfully completed\n\n")

        return his, self.model

Pass the low resolution image lr_imgs as training data and the high resolution image hr_imgs as the correct label to self.model.fit () Start learning. Save the parameters as soon as the training is complete.

Finally, I will explain the PSNR method.

** PSNR ** (Peak signal-to-noise ratio) is a ** evaluation index ** that indicates image deterioration, which is called the ** peak signal-to-noise ratio **. What is the peak signal or noise? You may think, but I'm sorry, I'm not a specialist, so I can't explain it.

The unit of this evaluation index is "db (decibel)". In general, it seems that ** PSNR 30db or more looks beautiful **. However, please note that ** human feelings and PSNR values do not always match **.

I will put the definition formula.

PSNR = 10 \log_{10}\frac{MAX^2}{MSE}\qquad(1.1)

$ MSE $ is the mean square error.

MSE = \frac{1}{n} \sum_{i=1}^{n} (SR_i - HR_i)^2\qquad(2)

$ MAX $ is ** the maximum value that a pixel can take **, but since it is divided by 255 and normalized to 0 to 1, ** $ MAX $ (maximum value) is 1 **. Substituting 1 for $ MAX $ in equation (1.1)

PSNR = 10 \log_{10}\frac{1}{MSE}\qquad(1.2)

By the logarithmic conversion formula of the quotient,

PSNR = -10 \log_{10}MSE\qquad(1.3)

It will be. This time, we use tf.keras.backend.flatten () in the calculation of mean squared error $ MSE $. If you use any tf.keras.backend function, you can't use the numpy function. I get an error. So the log in the expression uses tf.keras.backend.log (), which is a natural logarithm, not a common logarithm. Therefore, Eq. (1.3)

PSNR = -10 \frac{\ln MSE}{\ln 10}\qquad(1.4)

It is necessary to transform the equation as in equation (1.4) using the transformation formula of the bottom.

I'm ashamed to say that tf.keras.backend.log () isn't a common logarithm, and I'm not good at math, so I couldn't figure out why this was a formula transformation from the literature. Therefore, as a memorandum, the state of formula transformation is described in detail like this.

The code for equation (1.4) is shown below.

`trainer class(PSNR method)`


    # PSNR(Peak signal-to-noise ratio)
    def psnr(self, h3, hr_imgs):
        
        return -10*K.log(K.mean(K.flatten((h3 - hr_imgs))**2))/np.log(10)

K.mean (K.flatten ((h3 --hr_imgs)) ** 2 corresponds to $ MSE $ (mean square error). The denominator is the logarithm of numpy, but this is also the natural logarithm. For some reason, if I try to set the denominator log to tf.keras.backend.log (), I get an error. I wonder why?

Well, I was able to define the PSNR formula like this. By the way, if $ SR_i $ and $ HR_i $ in equation (2) are the same image, it will be PSNR $ + ∞ $ db.

in conclusion

I tried to explain a simple SRCNN implementation method in super-resolution technology, but how was it? In the next "Super-Resolution Technology-SRCNN-Implemented (Tensorflow 2.0) Prediction Phase Edition", we will actually make it super-resolution.