sys.intern ()
returns the "same" string in the execution environment for the "equivalent" string with the same content. The returned strings will now match with ʻis instead of
== `.
You should be able to expect significant speedups with sys.intern ()
, especially if the strings are long. An example is shown below.
intern_test.py
import sys
import timeit
def comp1(a, b):
return a == b
def comp2(a, b):
return a is b
for n in [10000, 50000]:
a = ''.join([str(s) for s in range(0, n)])
b = ''.join([str(s) for s in range(0, n)])
ia = sys.intern(a)
ib = sys.intern(b)
print("--{}--".format(n))
print("comp1(a, b)", comp1(a, b),
timeit.timeit("comp1(a, b)", globals=globals()), sep='\t')
print("comp2(a, b)", comp2(a, b),
timeit.timeit("comp2(a, b)", globals=globals()), sep='\t')
print("comp1(ia, ib)", comp1(ia, ib),
timeit.timeit("comp1(ia, ib)", globals=globals()), sep='\t')
print("comp2(ia, ib)", comp2(ia, ib),
timeit.timeit("comp2(ia, ib)", globals=globals()), sep='\t')
An example of execution in Python 3.6.2 is shown.
$ python intern_test.py
--10000--
comp1(a, b) True 1.5900884549773764
comp2(a, b) False 0.12032010598341003
comp1(ia, ib) True 0.13831643099547364
comp2(ia, ib) True 0.13083625899162143
--50000--
comp1(a, b) True 11.056225399981486
comp2(a, b) False 0.11997383600100875
comp1(ia, ib) True 0.13671555201290175
comp2(ia, ib) True 0.12875197199173272
It can be seen that comp2 (ia, ib)
is faster than comp1 (a, b)
. Also, as expected, ==
seems to be slower by O (n) than the size, whereas ʻis` is O (1).
By the way, the result of comp1 (ia, ib)
(ʻia == ib) is also fast enough, but it is less than 10ms compared to
comp2 (ia, ib) (ʻia is ib
). slow. Is it an if branch or something?
Addendum (2017-09-01): Corrected the content using Python 3.6.2. Addendum (2017-09-04): Corrected the text a little.
Recommended Posts