Pandas is a very useful library that gives you the flexibility to read data and perform type conversions. However, compared to int and float type, ** datetime type (datetime64 type in 64bit OS) requires careful handling, and subtraction is especially easy to get hooked on **.
Let's perform column subtraction between dates and times.
time_diff_test.csv
A,B
2020-06-02 13:45:16,2020-06-02 13:50:23
2020-06-02 13:50:15,2020-06-02 14:55:19
2020-06-02 13:52:10,2020-06-03 13:57:21
Please note that if you do not specify an argument when reading with pd.read_csv (), it will be read as str type instead of datetime type. There are several ways to read it as a datetime type column, but it's easier to specify the argument parse_dates as shown below.
time_diff_test.py
import pandas as pd
df = pd.read_csv('time_diff_test.csv',parse_dates=['A', 'B'])
df['diff'] = df['B'] - df['A']
print(df)
#0 0 days 00:05:07
#1 0 days 01:05:04
#2 1 days 00:05:11
#Name: diff, dtype: timedelta64[ns]
If you simply subtract columns as described above, it will be output as timedelta64 type.
In actual use, there are many situations where you want to get it in seconds or minutes.
If you want to convert timedelta type variables to seconds **. Total_seconds () ** is effective, It doesn't work even if I execute it as it is in the column (Series) of DateFrame
python
import pandas as pd
df = pd.read_csv('time_diff_test.csv',parse_dates=['A', 'B'])
df['diff'] = df['B'] - df['A']
print(df['diff'].total_seconds())
#'Series' object has no attribute 'total_seconds'
There are several ways to convert a timedelta type Series to seconds, By combining ** map and lambda as shown below, it is easy to write in one line **
time_diff_test_seconds.py
import pandas as pd
df = pd.read_csv('time_diff_test.csv',parse_dates=['A', 'B'])
df['diff'] = df['B'] - df['A']
df['diff_ts'] = df['diff'].map(lambda x: x.total_seconds())
print(df['diff_ts'])
#0 307.0
#1 3904.0
#2 86711.0
#Name: diff_ts, dtype: float64
df['diff_ts'] = df['diff'].dt.total_seconds()
print(df['diff_ts'])
#0 307.0
#1 3904.0
#2 86711.0
#Name: diff, dtype: float64
total_minutes (), total_hours () ‥ ** There is no method, lol ** Divide total_seconds () by 60, 3600
print(df['diff'].map(lambda x: x.total_seconds()/60.0))
print(df['diff'].map(lambda x: x.total_seconds()/3600.0))
#0 5.116667
#1 65.066667
#2 1445.183333
#Name: diff, dtype: float64
#0 0.085278
#1 1.084444
#2 24.086389
#Name: diff, dtype: float64
In addition to ** total_seconds () **, there are ** seconds ** as a way to convert timedelta to seconds. Let's find out the difference
total_seconds()
print(df['diff'].map(lambda x: x.total_seconds()))
#0 307.0
#1 3904.0
#2 86711.0
#Name: diff, dtype: float64
seconds
print(df['diff'].map(lambda x: x.total_seconds()))
#0 307
#1 3904
#2 311
#Name: diff, dtype: int64
According to here ** total_seconds (): Difference expressed in seconds (float type) ** ** seconds: The number of seconds (int type) of the difference decomposed into days, seconds, microseconds, milliseconds, minutes, hours, and weeks ** It seems that it is.
It seems that total_seconds () ** represents the so-called ** "time difference".
** This article is over ** Thank you for watching till the end!
Recommended Posts