sys.intern () returns the "same" string in the execution environment for the "equivalent" string with the same content. The returned strings will now match with ʻis instead of == `.
You should be able to expect significant speedups with sys.intern (), especially if the strings are long. An example is shown below.
intern_test.py
import sys
import timeit
def comp1(a, b):
return a == b
def comp2(a, b):
return a is b
for n in [10000, 50000]:
a = ''.join([str(s) for s in range(0, n)])
b = ''.join([str(s) for s in range(0, n)])
ia = sys.intern(a)
ib = sys.intern(b)
print("--{}--".format(n))
print("comp1(a, b)", comp1(a, b),
timeit.timeit("comp1(a, b)", globals=globals()), sep='\t')
print("comp2(a, b)", comp2(a, b),
timeit.timeit("comp2(a, b)", globals=globals()), sep='\t')
print("comp1(ia, ib)", comp1(ia, ib),
timeit.timeit("comp1(ia, ib)", globals=globals()), sep='\t')
print("comp2(ia, ib)", comp2(ia, ib),
timeit.timeit("comp2(ia, ib)", globals=globals()), sep='\t')
An example of execution in Python 3.6.2 is shown.
$ python intern_test.py
--10000--
comp1(a, b) True 1.5900884549773764
comp2(a, b) False 0.12032010598341003
comp1(ia, ib) True 0.13831643099547364
comp2(ia, ib) True 0.13083625899162143
--50000--
comp1(a, b) True 11.056225399981486
comp2(a, b) False 0.11997383600100875
comp1(ia, ib) True 0.13671555201290175
comp2(ia, ib) True 0.12875197199173272
It can be seen that comp2 (ia, ib) is faster than comp1 (a, b). Also, as expected, == seems to be slower by O (n) than the size, whereas ʻis` is O (1).
By the way, the result of comp1 (ia, ib) (ʻia == ib) is also fast enough, but it is less than 10ms compared to comp2 (ia, ib) (ʻia is ib). slow. Is it an if branch or something?
Addendum (2017-09-01): Corrected the content using Python 3.6.2. Addendum (2017-09-04): Corrected the text a little.
Recommended Posts