I compared the speed of regular expressions in Ruby, Python, and Perl (2013 version)

There was a case where I had to process a lot of text in my business, so I made it in Ruby, but it's really slow ... There is room for tuning in terms of code, but since it's not a story if you're using a slow guy in the first place, I decided to measure the speed difference in the main LL.

First of all, the measurement result

Perl wins for small texts (6MB), and Python wins for large texts. In each case, Ruby escapes the bottom by a small margin, but it's late. This is in contrast to Perl and Python, where their strengths and weaknesses are largely divided.

???_2.png

By the way, measurement conditions

--Extract IP address with regular expression from Nginx access log --Write the extraction result to a file --The machine used is an FD Core i5 iMac. ――Measure all 3 times --No abnormal values (extremely out-of-order values) were found

Introducing participating players

First from Ruby players

`show_ruby_version`


$ ruby -v
ruby 1.9.3p392 (2013-02-22 revision 39386) [x86_64-darwin12.3.0]

The code to use is as follows

`regex_test.rb`


#!/usr/bin/env ruby

re_addr = /((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))/

fh2 = open("./result_rb.txt", "w")
open("./access.log.1") { |fh|
  while line = fh.gets
    if m = re_addr.match(line)
      fh2.puts m[1]
    end
  end
}
fh2.close

Next Python player

`show_python_version`


$ python --version
Python 2.7.2

The code to use is as follows

`regex_test.py`


#!/usr/bin/env python

import re
re_addr = re.compile("((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))")

fh2 = open('./result_py.txt', 'w')
fh = open('./access.log.1')
for line in fh.readlines():
    m = re_addr.search(line)
    if m is not None:
        fh2.write(m.group(1))
        fh2.write("\n")
fh.close()
fh2.close()

Finally, a big veteran Perl player

`show_perl_version`


$ perl -v

This is perl 5, version 12, subversion 4 (v5.12.4) built for darwin-thread-multi-2level
#(Since it is long, it is omitted below)

The code to use is as follows

`regex_test.pl`


#!/usr/bin/env perl

open(FH2, ">", "./result_pl.txt");
open(FH, "<", "./access.log.1");
while($line = readline FH) {
  if ($line =~ /((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))/) {
    print FH2 $1."\n";
  }
}
close(FH);
close(FH2);

Like this, the logic is almost the same in each language, and it doesn't do much.

Afterword

Since it is a regular expression + file writing, it is not a comparison of pure regular expression ability. However, please forgive me because the reason for this benchmark is the processing of a large amount of text (extracting specific data with a regular expression and writing it to a file).

Since the memory release of each language is not properly considered, the result may change again if such a thing is done properly. Please use it as a reference only.

Recommended Posts

I compared the speed of regular expressions in Ruby, Python, and Perl (2013 version)

I compared the speed of Hash with Topaz, Ruby and Python

I compared the speed of the reference of the python in list and the reference of the dictionary comprehension made from the in list.

I replaced the numerical calculation of Python with Rust and compared the speed

I compared the speed of go language web framework echo and python web framework flask

Overlapping regular expressions in Python and Java

I compared the calculation time of the moving average written in Python

[Introduction to Python] I compared the naming conventions of C # and Python.

How to write the correct shebang in Perl, Python and Ruby scripts

I tried to solve the 2020 version of 100 language processing [Chapter 3: Regular expressions 25-29]

I measured the speed of list comprehension, for and while with python2.7.