This time, I saved the big image data again in h5 format. In h5py, you can compress variables with gzip by writing as follows.
x = cv2.imread("Big image 1.png ")
y = cv2.imread("Big image 2.png ")
with h5py.File("out.h5", "w") as f:
f.create_dataset("data1", data=X, compression="gzip", compression_level=4)
f.create_dataset("data2", data=y, compression="gzip", compression_level=4)
I'm not sure about the time because it's not the average of the results I've done several times.
Compression level | Output file size(GB) | Output time(sec) | Load time(sec) |
---|---|---|---|
Uncompressed | 6.83 | 7.7 | 10.1 |
1 | 1.48 | 81.3 | 53.7 |
4(Default) | 1.47 | 107.8 | 57.2 |
9 | 1.46 | 204.3 | 56.6 |
If the data changes, the result may change, but
――There is a big difference in file size between uncompressed and compressed, but the time required for input and output also increases significantly. ――If you increase the compression level, the processing time will be relatively long, but the compression rate will not increase so much. --Reading time does not change much regardless of compression level
I think it's simply the same trend as the gzip benchmark, It seems that there are cases where the difference in input / output time with and without compression is worrisome.
Recommended Posts