In image processing and image-based deep learning, the process of reading an image frequently occurs.

If it is hundreds of sheets, it will take several minutes just to read it if it is on the scale of tens of thousands, and if it is only once, it will be even faster if it is read many times for experiments.

Here, we will compare several libraries and show you how to reduce the loading time.

Conclusion

-** Save the image with pickle or np.save ** (Data size will increase) --Use the new pickle protocol --Save as np.uint8

Execution environment

macOS Mojave 10.14.6
Python 3.6

Comparison of reading speed

The library used and the time it took to load one image (acquire data as numpy.array) are as follows.

Library	Load time
OpenCV	4.23 ms
matplotlib	4.37 ms
keras.preprocessing	3.49 ms
skimage	2.56 ms
PIL	2.63 ms
numpy	333 µs
pickle(protocol=1)	597 µs
pickle(protocol=2)	599 µs
pickle(protocol=3)	112 µs
pickle(protocol=4)	118 µs
_pickle(protocol=4)	117 µs

The read image is a 512 x 512 png file.

As for numpy and pickle, the images saved as .npy and .pickle were loaded in advance, so it is not a fair comparison. ** It is a table that if you convert the image in advance, this speed will come out **, and it can not be concluded that numpy and pickle are generally fast.

pickle can specify protocol when pickle.dump, ** The newer the protocol, the faster the reading speed. ** That's right, so image data is saved for each protocol.

There is also a fast library like accimage, but I don't use it because it doesn't support macOS.

There is also an option hdf5, but it has not been examined.

The code used is as follows. (Using jupyter notebook)

import cv2
import matplotlib.pyplot as plt
import pickle
import numpy as np
from keras.preprocessing import image
from PIL import Image
from skimage import io
import _pickle

def imread1(path):
    return cv2.imread(path)

def imread2(path):
    return plt.imread(path)

def imread3(path):
    img = image.load_img(path)
    return image.img_to_array(img)

def imread4(path):
    return io.imread(path)

def imread5(path):
    img = Image.open(path)
    return np.asarray(img)

def numpy_load(path):
    return np.load(path)

def pickle_load(path):
    with open(path, mode='rb') as f:
        return pickle.load(f)
    
def _pickle_load(path):
    with open(path, mode='rb') as f:
        return _pickle.load(f)

%timeit img = imread1(img_path)
%timeit img = imread2(img_path)
%timeit img = imread3(img_path)
%timeit img = imread4(img_path)
%timeit img = imread5(img_path)
%timeit img = numpy_load(npy_path)
%timeit img = pickle_load(pickle_path_1)
%timeit img = pickle_load(pickle_path_2)
%timeit img = pickle_load(pickle_path_3)
%timeit img = pickle_load(pickle_path_4)
%timeit img = _pickle_load(pickle_path_4)

Data size comparison

The size of a 512x512 .png image saved with numpy and pickle is as follows.

Library	Data type	size
raw data	-	236 KB
numpy	np.uint8	820 KB
pickle(protocol=1)	np.uint8	820 KB
pickle(protocol=2)	np.uint8	820 KB
pickle(protocol=3)	np.uint8	787 KB
pickle(protocol=4)	np.uint8	787 KB
numpy	np.float32	3.1 MB
pickle(protocol=1)	np.float32	4.9 MB
pickle(protocol=2)	np.float32	4.8 MB
pickle(protocol=3)	np.float32	3.1 MB
pickle(protocol=4)	np.float32	3.1 MB

It was found that even np.uint8 occupies more than three times the capacity of the original data.

If you have enough storage space and want to increase the reading speed as much as possible, it seems better to convert it once so that it can be read easily with npy or pickle.

Faster loading of Python images

Conclusion

Execution environment

Comparison of reading speed

Data size comparison