There was a case where I had to process a lot of text in my business, so I made it in Ruby, but it's really slow ... There is room for tuning in terms of code, but since it's not a story if you're using a slow guy in the first place, I decided to measure the speed difference in the main LL.
Perl wins for small texts (6MB), and Python wins for large texts. In each case, Ruby escapes the bottom by a small margin, but it's late. This is in contrast to Perl and Python, where their strengths and weaknesses are largely divided.
--Extract IP address with regular expression from Nginx access log --Write the extraction result to a file --The machine used is an FD Core i5 iMac. ――Measure all 3 times --No abnormal values (extremely out-of-order values) were found
First from Ruby players
show_ruby_version
$ ruby -v
ruby 1.9.3p392 (2013-02-22 revision 39386) [x86_64-darwin12.3.0]
The code to use is as follows
regex_test.rb
#!/usr/bin/env ruby
re_addr = /((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))/
fh2 = open("./result_rb.txt", "w")
open("./access.log.1") { |fh|
while line = fh.gets
if m = re_addr.match(line)
fh2.puts m[1]
end
end
}
fh2.close
Next Python player
show_python_version
$ python --version
Python 2.7.2
The code to use is as follows
regex_test.py
#!/usr/bin/env python
import re
re_addr = re.compile("((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))")
fh2 = open('./result_py.txt', 'w')
fh = open('./access.log.1')
for line in fh.readlines():
m = re_addr.search(line)
if m is not None:
fh2.write(m.group(1))
fh2.write("\n")
fh.close()
fh2.close()
Finally, a big veteran Perl player
show_perl_version
$ perl -v
This is perl 5, version 12, subversion 4 (v5.12.4) built for darwin-thread-multi-2level
#(Since it is long, it is omitted below)
The code to use is as follows
regex_test.pl
#!/usr/bin/env perl
open(FH2, ">", "./result_pl.txt");
open(FH, "<", "./access.log.1");
while($line = readline FH) {
if ($line =~ /((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))/) {
print FH2 $1."\n";
}
}
close(FH);
close(FH2);
Like this, the logic is almost the same in each language, and it doesn't do much.
Since it is a regular expression + file writing, it is not a comparison of pure regular expression ability. However, please forgive me because the reason for this benchmark is the processing of a large amount of text (extracting specific data with a regular expression and writing it to a file).
Since the memory release of each language is not properly considered, the result may change again if such a thing is done properly. Please use it as a reference only.
Recommended Posts