[RUBY] Read the file line by line VS read at once

When reading a file, it is better in terms of memory to read line by line instead of reading at once. I would like to experiment to see if that is true.

The text to be read is as follows.

test.txt
john	m	19
micheal	m	28
abbie	f	31
dabid	m	17
claire	f	26

First, try reading at once

Try it with the following code

require 'objspace'

start_time = Time.new

puts File.read("test.txt")

end_time = Time.new

puts end_time - start_time
puts "#{ObjectSpace.memsize_of_all * 0.001 * 0.001} MB"

I wrote the part to be read very short. And the result is

$ ruby all_read.rb
john	m	19
micheal	m	28
abbie	f	31
dabid	m	17
claire	f	26
4.9e-05
2.951902 MB

Then read line by line

Try it with the following code

require 'objspace'

start_time = Time.new

 File.open("test.txt") do |text|
   text.each_line do |line|
     puts line
   end
end

end_time = Time.new

puts end_time - start_time
puts "#{ObjectSpace.memsize_of_all * 0.001 * 0.001} MB"

The number of lines has increased a little in contrast to reading at once. And the result is

$ ruby each_read.rb
john	m	19
micheal	m	28
abbie	f	31
dabid	m	17
claire	f	26
0.000112
2.9598400000000002 MB

I did the same operation several times, If the number of lines is small, does it use less memory if it is read at once? I think.

Now let's think about what would happen if we tried it with a longer text.

Think in a longer text

test.txt
john	m	19
micheal	m	28
abbie	f	31
dabid	m	17
claire	f	26

(Omitted) john m 19 micheal m 28 abbie f 31 dabid m 17 claire f 26

I just added the same line for the time being, I would like to think about 1000 lines, 5000 lines, and 10000 lines.

Read 1000 lines of text

I measured each 5 times.

1 2 3 4 5
Read at once 2.965176 MB 2.963405 MB 2.965656 MB 2.965656 MB 2.965656 MB
Read line by line 3.002243 MB 3.000736 MB 2.999688 MB 2.999808 MB 3.002083 MB

Read 5000 lines of text

1 2 3 4 5
Read at once 3.010936 MB 3.011384 MB 3.009285 MB 3.008709 MB 3.008349 MB
Read line by line 2.542326 MB 2.542286 MB 2.542246 MB 2.542286 MB 2.54435 MB

Read 10000 lines of text

1 2 3 4 5
Read at once 3.065925 MB 3.065341 MB 3.065173 MB 3.068216 MB 3.067936 MB
Read line by line 2.403886 MB 2.404046 MB 2.404534 MB 2.404366 MB 2.403886 MB

Conclusion

It became clear. As the number of lines increases "Read all at once" has increased, while "Read line by line" has decreased. If you want to read a lot, it is better to "read one line at a time".

However, it was shorter to "read at once" in terms of processing time. Is that so?

References

http://simplesandsamples.com/readlines.rb.html https://techacademy.jp/magazine/7797 https://blog.freedom-man.com/measure-ruby-memory-usage

Recommended Posts

Read the file line by line VS read at once
Why put a line break at the end of the file
Change the Swagger-ui read file. (Using AWS/Docker)
Read the file under WEB-INF when executing the Servlet
[Java] Read the file in src / main / resources
[Java] Integer information of characters in a text file acquired by the read () method
Take a quick look at Gradle and read the build.gradle generated by Spring Initializr
Read the packet capture obtained by tcpdump in Java
Image processing: The basic structure of the image read by the program