In recent years, there has been an increasing demand for reading a large number of files for machine learning. However, if you try to read a large number of files, the file reading overhead may be greater than the main processing in your program. For example, CIFAR-10 stores multiple images in a single file to reduce file read overhead.
I was curious about this effect, so I used CIFAR-10 to investigate how the file reading overhead affects the program.
The data set to be measured is CIFAR-10, which is familiar in the image recognition world. CIFAR-10 is an image group consisting of 10 classes of 32 x 32 pixels. Use the one distributed as a binary file on the above site. Image data for 10,000 images is described in one binary file. The binary structure is as follows.
The capacity of one image is 1 byte for label + 32 x 32 x 3 bytes for image data = 3073 bytes, so one binary file is about 30 MB. By reading this, the overhead of reading the file is measured.
The following three programs are prepared to measure the overhead of reading a file.
open_time.cpp
open_time_individual.cpp
open_time_loop.cpp
open_time
Is cifar-A program that directly reads 10 binary files.
fopen () and fclose () are called only once during execution.
open_time_individual
Is cifar-It is a program that reads 10 binary files from the directory saved by dividing each image in advance.
fopen () and fclose () are called 10000 times in the program, which is the number of images.
open_time_loop
Is cifar-It is a program that directly reads 10 binary files,open_time
Unlike fopen for each image()、fclose()Is a program that calls.
open_time_individual
Similar to fopen()、fclose()Is called 10000 times during execution.
Except for the above file reading, the processing common to these three programs is explained.
The execution time is measured by system_clock in the chrono library.
As mentioned in Dataset, the first byte of the binary file is the label of the image, so fseek (fp, 1L, SEEK_CUR)
skips 1 byte. The image is read by ``` fread (pic, sizeof (uint8_t), 3072, fp)
``, and the value of each pixel is loaded, added and stored as a process in the loop.
Note that error handling for file operations is omitted.
open_time.cpp
#include <stdio.h>
#include <chrono>
int main(int argc, char** argv) {
chrono::system_clock::time_point start, end;
uint8_t pic[3072] = {0};
start = chrono::system_clock::now();
auto fp = fopen("./cifar-10-batches-bin/data_batch_1.bin", "rb");
for(int j=0;j<10000;++j){
fseek(fp,1L,SEEK_CUR);
fread(pic, sizeof(uint8_t), 3072, fp);
for(int i=0;i<3072;++i){
pic[i]++;
}
}
fclose(fp);
end = chrono::system_clock::now();
double time = static_cast<double>(chrono::duration_cast<chrono::microseconds>(end - start).count() / 1000.0);
printf("time %lf[ms]\n", time);
return 0;
open_time_individual.cpp
#include <stdio.h>
#include <chrono>
#include <string>
int main(int argc, char** argv) {
chrono::system_clock::time_point start, end;
std::string filenames[10000] = {""};
for(int j=0; j<10000;++j){
filenames[j] = "./cifar10-raw/" + std::to_string(j) + ".bin";
}
uint8_t pic[3072] = {0};
start = chrono::system_clock::now();
for(int j=0;j<10000;++j){
auto fp = fopen(filenames[j].c_str(), "rb");
fseek(fp,1L,SEEK_CUR);
fread(pic, sizeof(uint8_t), 3072, fp);
for(int i=0;i<3072;++i){
pic[i]++;
}
fclose(fp);
}
end = chrono::system_clock::now();
double time = static_cast<double>(chrono::duration_cast<chrono::microseconds>(end - start).count() / 1000.0);
printf("time %lf[ms]\n", time);
return 0;
open_time_loop.cpp
#include <stdio.h>
#include <chrono>
int main(int argc, char** argv) {
chrono::system_clock::time_point start, end;
uint8_t pic[3072] = {0};
start = chrono::system_clock::now();
for(int j=0;j<10000;++j){
auto fp = fopen("./cifar-10-batches-bin/data_batch_1.bin", "rb");
fseek(fp,1L+3073L*j,SEEK_CUR);
fread(pic, sizeof(uint8_t), 3072, fp);
for(int i=0;i<3072;++i){
pic[i]++;
}
fclose(fp);
}
end = chrono::system_clock::now();
double time = static_cast<double>(chrono::duration_cast<chrono::microseconds>(end - start).count() / 1000.0);
printf("time %lf[ms]\n", time);
return 0;
}
The result of the actual execution is shown below.
% ./open_time
time 62.964000[ms]
% ./open_time_individual
time 1154.943000[ms]
% ./open_time_loop
time 1086.277000[ms]
open_time
Againstopen_time_individual
Whenopen_time_loop
では約20倍の実行時間がかかるこWhenがわかります。
You can also see that the execution times of open_time_individual
and open_time_loop
are about the same.
open_time
Whenopem_time_loop
Is a program that reads the same data area, but the execution time is fopen()You can see that it depends on.
Also, since the execution times of open_time_individual
and open_time_loop
are about the same, we can see that the execution time depends on the number of times, not the type of file to fopen ().
For fopen (), you need to open the file with a system call, and you need to switch from user mode to kernel mode. In the case of memory access, once allocated address space can be executed without switching overhead. It turns out that for images of CIFAR-10 or so, it takes more time to process fopen () than memory access.
Appendix Shell script used to generate a binary file divided for each image from the CIFAR-10 binary file
for i in `seq 0 9999`
do
t=$(($i * 3073))
tail -c +$t cifar-10-batches-bin/data_batch_1.bin | head -c 3073 > "cifar10-raw/"$i".bin"
done
Python script to convert to png to determine if the split binary file is correct as an image
import numpy as np
from PIL import Image
fp = open("sample.bin", "rb")
label = fp.read(1)
data = np.zeros(3072, dtype='uint8')
for i in range(3072):
data[i] = int.from_bytes(fp.read(1), 'little')
fp.close()
data = data.reshape(3, 32, 32)
data = np.swapaxes(data, 0, 2)
data = np.swapaxes(data, 0, 1)
with Image.fromarray(data) as img:
img.save("sample.png ")
Recommended Posts