This error occurs when a character string that cannot be recognized as a date is included. I'm addicted to it so I've summarized it in the article.
When creating a DataFrame, if there is a "non-date value" between the date strings, it will be read as an object. (If all of them are composed of character strings that can be interpreted as dates, they can be read as datetime.)
For example, a hyphen (-) is inserted between dates as a common case in actual data.
temp = pd.DataFrame(["2020-04-09", "2020-04-10", "-", "2020-04-12"], columns=["date"])
date | |
---|---|
0 | 2020-04-09 |
1 | 2020-04-10 |
2 | - |
3 | 2020-04-12 |
At this time, if you try to convert to datetime type, an error will occur.
pd.to_datetime(temp.date)
#Error output
TypeError: Unrecognized value type: <class 'str'>
The countermeasures are as follows.
Ignore the conversion error and execute. At that time, the part of conversion NG will be "NaT".
pd.to_datetime(temp.date, errors="coerce")
#Conversion result
0 2020-04-09
1 2020-04-10
2 NaT
3 2020-04-12
Name: date, dtype: datetime64[ns]
I think it's okay if you understand in advance that hyphens are included and ignored like this time. However, I'm not sure, but I got an error, so it may be better to avoid using it.
Correctly replace or remove unnecessary character strings in advance. Alternatively, take action at the data file or DB stage.
For example, replace the hyphen with an empty string and then execute. (In the case of an empty string, pd.to_datetime () can complete the execution without skipping an exception.) Similar to the above forced execution, the location of conversion NG is "NaT".
temp.date = temp.date.replace({"-":""})
pd.to_datetime(temp.date)
#Conversion result
0 2020-04-09
1 2020-04-10
2 NaT
3 2020-04-12
Name: date, dtype: datetime64[ns]
In the above example, you can immediately see that hyphens are mixed in due to the small amount of data, but it becomes difficult to grasp when the amount of data is large. For example, you can easily check for illegal characters with the code below. An exception is thrown at the first incorrect location. To check everything, it is an image of repeating the check while correcting the caught part each time. (Please note that if there are too many variations of fraud, it can be a tremendous task.)
def check(x):
print(x)
pd.to_datetime(x)
temp.date.map(check)
that's all.
Recommended Posts