When performing a set operation, the execution speed can be improved by reducing the number of elements of the obtained result. Since the operation result is returned in a new set object, it takes time to create the object if the number of elements is large.
In ABC157 D Friend Suggestions, when I set len (XYZ)
to calculate the number of elements in a set, it became TLE. AC was done with len (X) -len (X & (Y | Z)) `.
I tried to verify why the speed is different.
On the premise of the problem
from timeit import timeit
xyz = 'X=set(range(10**5)); Y=set(range(10)); Z=set(range(5,15))'
timeit('len(X-Y-Z)', setup=xyz, number=100)
# 0.3884289239649661
timeit('len(X)-len(X&(Y|Z))', setup=xyz, number=100)
# 0.0001103340182453394
Although the results are the same, there is a 3520-fold difference in execution time.
Next, let X, Y, Z all be the same element of $ 10 ^ 5 $.
from timeit import timeit
xyz = 'X=set(range(10**5)); Y=set(range(10**5)); Z=set(range(10**5))'
timeit('len(X-Y-Z)', setup=xyz, number=100)
# 0.28364974400028586
timeit('len(X)-len(X&(Y|Z))', setup=xyz, number=100)
# 1.1718004010035656
Next timelen(X)-len(X&(Y|Z))
Was slower.X&(Y|Z)
Is the same as the original set, and it is considered that the number of elements in the result has increased.
On the other hand, len (X-Y-Z)
was shortened to about 1/3, probably because an empty set was obtained at the stage of X-Y
.
Simplify the problem and compare only the difference and product operations. The other side of the operation is an empty set, and the left and right sides are exchanged.
from timeit import timeit
xy = 'X=set(range(10**5)); Y=set()'
timeit('X-Y', setup=xy, number=100)
# 0.16930873499950394
timeit('Y-X', setup=xy, number=100)
# 1.7047044821083546e-05
timeit('X&Y', setup=xy, number=100)
# 1.0746996849775314e-05
timeit('Y&X', setup=xy, number=100)
# 1.502997474744916e-05
Even the difference set is fast when the number of elements in the result is small. Apparently, the difference in speed is not the content of the calculation.
See the Python documentation set.difference
Returns a new set with elements that are included in set and not included in all other.
a. Therefore, let's measure the generation time of a set with a large number of elements.
from timeit import timeit
timeit('set(X)', setup='X=set(range(10**5))', number=100)
# 0.16229172004386783
After all, when the number of elements in the result is large, it only takes time to generate the set to return.
Recommended Posts