When reading a file, it is better in terms of memory to read line by line instead of reading at once. I would like to experiment to see if that is true.
The text to be read is as follows.
test.txt
john m 19
micheal m 28
abbie f 31
dabid m 17
claire f 26
Try it with the following code
require 'objspace'
start_time = Time.new
puts File.read("test.txt")
end_time = Time.new
puts end_time - start_time
puts "#{ObjectSpace.memsize_of_all * 0.001 * 0.001} MB"
I wrote the part to be read very short. And the result is
$ ruby all_read.rb
john m 19
micheal m 28
abbie f 31
dabid m 17
claire f 26
4.9e-05
2.951902 MB
Try it with the following code
require 'objspace'
start_time = Time.new
File.open("test.txt") do |text|
text.each_line do |line|
puts line
end
end
end_time = Time.new
puts end_time - start_time
puts "#{ObjectSpace.memsize_of_all * 0.001 * 0.001} MB"
The number of lines has increased a little in contrast to reading at once. And the result is
$ ruby each_read.rb
john m 19
micheal m 28
abbie f 31
dabid m 17
claire f 26
0.000112
2.9598400000000002 MB
I did the same operation several times, If the number of lines is small, does it use less memory if it is read at once? I think.
Now let's think about what would happen if we tried it with a longer text.
test.txt
john m 19
micheal m 28
abbie f 31
dabid m 17
claire f 26
(Omitted) john m 19 micheal m 28 abbie f 31 dabid m 17 claire f 26
I just added the same line for the time being, I would like to think about 1000 lines, 5000 lines, and 10000 lines.
I measured each 5 times.
1 | 2 | 3 | 4 | 5 | |
---|---|---|---|---|---|
Read at once | 2.965176 MB | 2.963405 MB | 2.965656 MB | 2.965656 MB | 2.965656 MB |
Read line by line | 3.002243 MB | 3.000736 MB | 2.999688 MB | 2.999808 MB | 3.002083 MB |
1 | 2 | 3 | 4 | 5 | |
---|---|---|---|---|---|
Read at once | 3.010936 MB | 3.011384 MB | 3.009285 MB | 3.008709 MB | 3.008349 MB |
Read line by line | 2.542326 MB | 2.542286 MB | 2.542246 MB | 2.542286 MB | 2.54435 MB |
1 | 2 | 3 | 4 | 5 | |
---|---|---|---|---|---|
Read at once | 3.065925 MB | 3.065341 MB | 3.065173 MB | 3.068216 MB | 3.067936 MB |
Read line by line | 2.403886 MB | 2.404046 MB | 2.404534 MB | 2.404366 MB | 2.403886 MB |
It became clear. As the number of lines increases "Read all at once" has increased, while "Read line by line" has decreased. If you want to read a lot, it is better to "read one line at a time".
However, it was shorter to "read at once" in terms of processing time. Is that so?
http://simplesandsamples.com/readlines.rb.html https://techacademy.jp/magazine/7797 https://blog.freedom-man.com/measure-ruby-memory-usage
Recommended Posts