Pandas and date and time data

Pandas is a very useful library that gives you the flexibility to read data and perform type conversions. However, compared to int and float type, ** datetime type (datetime64 type in 64bit OS) requires careful handling, and subtraction is especially easy to get hooked on **.

Date and time subtraction

Let's perform column subtraction between dates and times.

Data to use

`time_diff_test.csv`


A,B
2020-06-02 13:45:16,2020-06-02 13:50:23
2020-06-02 13:50:15,2020-06-02 14:55:19
2020-06-02 13:52:10,2020-06-03 13:57:21

Code that reads data and performs subtraction

Please note that if you do not specify an argument when reading with pd.read_csv (), it will be read as str type instead of datetime type. There are several ways to read it as a datetime type column, but it's easier to specify the argument parse_dates as shown below.

`time_diff_test.py`


import pandas as pd
df = pd.read_csv('time_diff_test.csv',parse_dates=['A', 'B'])
df['diff'] = df['B'] - df['A']
print(df)

#0   0 days 00:05:07
#1   0 days 01:05:04
#2   1 days 00:05:11
#Name: diff, dtype: timedelta64[ns]

If you simply subtract columns as described above, it will be output as timedelta64 type.

Get date and time subtraction in seconds (minutes, hours, days ...)

In actual use, there are many situations where you want to get it in seconds or minutes.

Code that doesn't work

If you want to convert timedelta type variables to seconds **. Total_seconds () ** is effective, It doesn't work even if I execute it as it is in the column (Series) of DateFrame

`python`


import pandas as pd
df = pd.read_csv('time_diff_test.csv',parse_dates=['A', 'B'])
df['diff'] = df['B'] - df['A']
print(df['diff'].total_seconds())

#'Series' object has no attribute 'total_seconds'

Code that works

There are several ways to convert a timedelta type Series to seconds, By combining ** map and lambda as shown below, it is easy to write in one line **

`time_diff_test_seconds.py`


import pandas as pd
df = pd.read_csv('time_diff_test.csv',parse_dates=['A', 'B'])
df['diff'] = df['B'] - df['A']
df['diff_ts'] = df['diff'].map(lambda x: x.total_seconds())
print(df['diff_ts'])

#0       307.0
#1     3904.0
#2    86711.0
#Name: diff_ts, dtype: float64

[Addition] Use Series.dt.total_seconds ()

Comment from @nkay It can be calculated more simply than map + lambda above, and it seems to be the best choice.

df['diff_ts'] = df['diff'].dt.total_seconds()
print(df['diff_ts'])

#0      307.0
#1     3904.0
#2    86711.0
#Name: diff, dtype: float64

Time difference in minutes and hours

total_minutes (), total_hours () ‥ ** There is no method, lol ** Divide total_seconds () by 60, 3600

print(df['diff'].map(lambda x: x.total_seconds()/60.0))
print(df['diff'].map(lambda x: x.total_seconds()/3600.0))

#0       5.116667
#1      65.066667
#2    1445.183333
#Name: diff, dtype: float64
#0     0.085278
#1     1.084444
#2    24.086389
#Name: diff, dtype: float64

Difference between total_seconds () and seconds

In addition to ** total_seconds () **, there are ** seconds ** as a way to convert timedelta to seconds. Let's find out the difference

`total_seconds()`


print(df['diff'].map(lambda x: x.total_seconds()))
#0      307.0
#1     3904.0
#2    86711.0
#Name: diff, dtype: float64

`seconds`


print(df['diff'].map(lambda x: x.total_seconds()))
#0     307
#1    3904
#2     311
#Name: diff, dtype: int64

According to here ** total_seconds (): Difference expressed in seconds (float type) ** ** seconds: The number of seconds (int type) of the difference decomposed into days, seconds, microseconds, milliseconds, minutes, hours, and weeks ** It seems that it is.

It seems that total_seconds () ** represents the so-called ** "time difference".

** This article is over ** Thank you for watching till the end!

Calculate the time difference between two columns with Pandas DataFrame