pd.tseries.offsets.DateOffset can be quite slow if not used with caution

What i want to say

It is better not to use pd.tseries.offsets.DateOffset.__mul__.

Background

pandas provides a class called pd.Timestamp to handle dates and times. Also, if you want to calculate a date and time that is a certain period away from a certain date and time, use pd.tseries.offsets.DateOffset. This time, I'll show you the interesting behavior of pd.tseries.offsets.DateOffset.

First, look at the execution result below.

$ python -m timeit -u msec -n 10 -s "import pandas as pd" "pd.Timestamp('2010-01-01 00:00:00') + 100 * pd.tseries.offsets.DateOffset(seconds=1)"
10 loops, best of 3: 0.438 msec per loop
$ python -m timeit -u msec -n 10 -s "import pandas as pd" "pd.Timestamp('2010-01-01 00:00:00') + 1000 * pd.tseries.offsets.DateOffset(seconds=1)"
10 loops, best of 3: 3.85 msec per loop
$ python -m timeit -u msec -n 10 -s "import pandas as pd" "pd.Timestamp('2010-01-01 00:00:00') + 10000 * pd.tseries.offsets.DateOffset(seconds=1)"
10 loops, best of 3: 41.9 msec per loop

Do you know what you mean? ** Execution time increases linearly by multiplying DateOffset. ** ** Probably, it is called internally as many times as pd.Timestamp.__add__ is multiplied.

This is a good way to do the same thing.

$ python -m timeit -u msec -n 10 -s "import pandas as pd" "pd.Timestamp('2010-01-01 00:00:00') + pd.tseries.offsets.DateOffset(seconds=1*100)"  
10 loops, best of 3: 0.0328 msec per loop
$ python -m timeit -u msec -n 10 -s "import pandas as pd" "pd.Timestamp('2010-01-01 00:00:00') + pd.tseries.offsets.DateOffset(seconds=1*1000)"   
10 loops, best of 3: 0.0373 msec per loop
$ python -m timeit -u msec -n 10 -s "import pandas as pd" "pd.Timestamp('2010-01-01 00:00:00') + pd.tseries.offsets.DateOffset(seconds=1*10000)"
10 loops, best of 3: 0.0336 msec per loop

It became fast (or rather, intuitive behavior).

By the way, if you use datetime of python standard module, it will be like this. It's very fast.

$ python -m timeit -u msec -n 10 -s "import datetime" "datetime.datetime(2010,1,1) + 100 * datetime.timedelta(seconds=1)"
10 loops, best of 3: 0.0031 msec per loop
$ python -m timeit -u msec -n 10 -s "import datetime" "datetime.datetime(2010,1,1) + 1000 * datetime.timedelta(seconds=1)"
10 loops, best of 3: 0.00276 msec per loop
$ python -m timeit -u msec -n 10 -s "import datetime" "datetime.datetime(2010,1,1) + 10000 * datetime.timedelta(seconds=1)"
10 loops, best of 3: 0.00227 msec per loop

Recommended Posts

pd.tseries.offsets.DateOffset can be quite slow if not used with caution
Until youtube-dl can be used with Synology (DS120j)
File types that can be used with Go
If "can not be used when making a PIE object" appears in make
Japanese can be used with Python in Docker environment
Python knowledge notes that can be used with AtCoder
SSD 1306 OLED can be used with Raspberry Pi + python (Note)
Until torch-geometric can be used only with Windows (or Mac) CPU
Can be used when aws-cli is available but jq is not available jp.py
Check if mod_wsgi can be built
requirements.txt can be commented out with #
Acoustic signal processing module that can be used with Python-Sounddevice ASIO [Application]
Mathematical optimization that can be used for free work with Python + PuLP
Acoustic signal processing module that can be used with Python-Sounddevice ASIO [Basic]
I made a familiar function that can be used in statistics with Python
Linux command that can be used from today if you know it (Basic)
Play with machine learning: Can Q-Learning determine if marketing-related actions should be taken?