In image processing and image-based deep learning, the process of reading an image frequently occurs.
If it is hundreds of sheets, it will take several minutes just to read it if it is on the scale of tens of thousands, and if it is only once, it will be even faster if it is read many times for experiments.
Here, we will compare several libraries and show you how to reduce the loading time.
-** Save the image with pickle
or np.save
** (Data size will increase)
--Use the new pickle
protocol
--Save as np.uint8
The library used and the time it took to load one image (acquire data as numpy.array) are as follows.
Library | Load time |
---|---|
OpenCV | 4.23 ms |
matplotlib | 4.37 ms |
keras.preprocessing | 3.49 ms |
skimage | 2.56 ms |
PIL | 2.63 ms |
numpy | 333 µs |
pickle(protocol=1) | 597 µs |
pickle(protocol=2) | 599 µs |
pickle(protocol=3) | 112 µs |
pickle(protocol=4) | 118 µs |
_pickle(protocol=4) | 117 µs |
The read image is a 512 x 512 png file.
As for numpy and pickle, the images saved as .npy
and .pickle
were loaded in advance, so it is not a fair comparison.
** It is a table that if you convert the image in advance, this speed will come out **, and it can not be concluded that numpy and pickle are generally fast.
pickle can specify protocol when pickle.dump
, ** The newer the protocol, the faster the reading speed. ** That's right, so image data is saved for each protocol.
There is also a fast library like accimage, but I don't use it because it doesn't support macOS.
There is also an option hdf5, but it has not been examined.
The code used is as follows. (Using jupyter notebook)
import cv2
import matplotlib.pyplot as plt
import pickle
import numpy as np
from keras.preprocessing import image
from PIL import Image
from skimage import io
import _pickle
def imread1(path):
return cv2.imread(path)
def imread2(path):
return plt.imread(path)
def imread3(path):
img = image.load_img(path)
return image.img_to_array(img)
def imread4(path):
return io.imread(path)
def imread5(path):
img = Image.open(path)
return np.asarray(img)
def numpy_load(path):
return np.load(path)
def pickle_load(path):
with open(path, mode='rb') as f:
return pickle.load(f)
def _pickle_load(path):
with open(path, mode='rb') as f:
return _pickle.load(f)
%timeit img = imread1(img_path)
%timeit img = imread2(img_path)
%timeit img = imread3(img_path)
%timeit img = imread4(img_path)
%timeit img = imread5(img_path)
%timeit img = numpy_load(npy_path)
%timeit img = pickle_load(pickle_path_1)
%timeit img = pickle_load(pickle_path_2)
%timeit img = pickle_load(pickle_path_3)
%timeit img = pickle_load(pickle_path_4)
%timeit img = _pickle_load(pickle_path_4)
The size of a 512x512 .png image saved with numpy and pickle is as follows.
Library | Data type | size |
---|---|---|
raw data | - | 236 KB |
numpy | np.uint8 | 820 KB |
pickle(protocol=1) | np.uint8 | 820 KB |
pickle(protocol=2) | np.uint8 | 820 KB |
pickle(protocol=3) | np.uint8 | 787 KB |
pickle(protocol=4) | np.uint8 | 787 KB |
numpy | np.float32 | 3.1 MB |
pickle(protocol=1) | np.float32 | 4.9 MB |
pickle(protocol=2) | np.float32 | 4.8 MB |
pickle(protocol=3) | np.float32 | 3.1 MB |
pickle(protocol=4) | np.float32 | 3.1 MB |
It was found that even np.uint8 occupies more than three times the capacity of the original data.
If you have enough storage space and want to increase the reading speed as much as possible, it seems better to convert it once so that it can be read easily with npy or pickle.
Recommended Posts