There are several ways to determine a prefix match for a string that is Ptyhon. Among them, the following three typical speed comparisons are performed.
Implement in the following execution environment
| item | value |
|---|---|
| Python Version | 3.8.2 |
| OS | Ubuntu 20.04 |
Check the operation based on the following program. The roles of each variable and each function are as follows. Change the variable according to the characteristics you want to measure.
| variable/function | Description |
|---|---|
| time_logging | Decorator for measuring time |
| compare_regex | Compare each of the list of argument strings with a regular expression |
| compare_startswith | Each of the list of argument stringsstartswithCompare by method |
| compare_str | The first string in each of the list of argument strings istarget_wordCompare if equal to |
| target_word | Character string to be compared |
| match_word | target_wordString prefix that matches |
| not_match_word | target_wordString prefix that does not match |
| compare_word_num | Total number of strings to compare |
| compare_func | Function to measure |
| main | Function to be called |
import re
import time
def time_logging(func):
def deco(*args, **kwargs):
stime = time.time()
res = func(*args, **kwargs)
etime = time.time()
print(f'Finish {func.__name__}. Takes {round(etime - stime, 3)}s.', flush=True)
return res
return deco
@time_logging
def compare_regex(compare_words):
pattern = re.compile(f'^{target_word}')
for word in compare_words:
if pattern.match(word):
pass
@time_logging
def compare_startswith(compare_words):
for word in compare_words:
if word.startswith(target_word):
pass
@time_logging
def compare_str(compare_words):
length = len(target_word)
for word in compare_words:
if word[:length] == target_word:
pass
target_word = f'foo'
match_word = f'{target_word}'
not_match_word = f'bar'
compare_word_num = 100_000_000
match_rate = 50
compare_func = compare_regex
def main():
compare_words = []
for index in range(compare_word_num):
if index % 100 <= match_rate:
compare_words.append(f'{match_word}_{index}')
else:
compare_words.append(f'{not_match_word}_{index}')
compare_func(compare_words)
if __name__ == '__main__':
main()
Since the tendency of execution speed may change depending on the length of the character string to be compared,
Measure the execution speed of compare_regex, compare_startswith, and compare_str when target_word is changed to 5, 10, 50, 100, and 500 characters, respectively.
Unit (seconds)
| function\word count | 5 | 10 | 50 | 100 | 500 |
|---|---|---|---|---|---|
| compare_regex | 11.617 | 12.044 | 16.126 | 18.837 | 66.463 |
| compare_startswith | 6.647 | 6.401 | 6.241 | 6.297 | 6.931 |
| compare_str | 5.941 | 5.993 | 4.87 | 5.449 | 8.875 |

In terms of speed, it should be implemented with starts with or str [: word_length] for any number of characters. The most recommended is starts with, which is the least affected by the string to be compared.
I also like it the most in terms of readability.
Recommended Posts