I had a bug (looked like) when I tried to replace np.nan
with None
using the replace
method in DataFrame
of pandas
Environment
Conducted at Google Colaboratory
Click here for DataFrame for operation check
import pandas as pd
import numpy as np
indexes = [
datetime.datetime(2020, 1, 1, 11, 50),
datetime.datetime(2020, 1, 1, 12, 50),
datetime.datetime(2020, 1, 1, 12, 52),
datetime.datetime(2020, 1, 1, 18, 50),
datetime.datetime(2020, 1, 1, 19, 50),
datetime.datetime(2020, 1, 1, 21, 50),
]
df = pd.DataFrame({
'high': [1, np.nan, 3, np.nan, np.nan, 11],
'close': [4, 5, 6, 7, np.nan, 2],
'memo': ['sign', '', np.nan, 'sign2', np.nan, 'sign3'],
'bool': [True, None, True, False, None, False],
'stoploss': [True, None, True, False, None, False]
}, index=indexes)
df
-> high close memo bool stoploss
2020-01-01 11:50:00 1.0 4.0 sign True True
2020-01-01 12:50:00 NaN 5.0 None None
2020-01-01 12:52:00 3.0 6.0 NaN True True
2020-01-01 18:50:00 NaN 7.0 sign2 False False
2020-01-01 19:50:00 NaN NaN NaN None None
2020-01-01 21:50:00 11.0 2.0 sign3 False False
Those who have bugs
df.replace(np.nan, None)
-> high close memo bool stoploss
2020-01-01 11:50:00 1.0 4.0 sign True True
2020-01-01 12:50:00 1.0 5.0 True True
2020-01-01 12:52:00 3.0 6.0 True True
2020-01-01 18:50:00 3.0 7.0 sign2 False False
2020-01-01 19:50:00 3.0 7.0 sign2 False False
2020-01-01 21:50:00 11.0 2.0 sign3 False False
...What's this! !!ヾ ノ .ÒдÓ) Noshi bang bang !!
Where it was np.nan
, it is not None
, it is filled with the previous value
(It looks like it was fillna
)
Fine? Who
df.replace({np.nan: None})
-> high close memo bool stoploss
2020-01-01 11:50:00 1 4 sign True True
2020-01-01 12:50:00 None 5 None None
2020-01-01 12:52:00 3 6 None True True
2020-01-01 18:50:00 None 7 sign2 False False
2020-01-01 19:50:00 None None None None None
2020-01-01 21:50:00 11 2 sign3 False False
As expected (?
No, I noticed that somehow, float
is all integers ...
It's okay (help)
... I was impatient for a moment (more than 30 minutes), but when I looked closely, the contents were still float.
tmp_df = df.replace({np.nan: None})
tmp_df.values
-> array([[1.0, 4.0, 'sign', True, True],
[None, 5.0, '', None, None],
[3.0, 6.0, None, True, True],
[None, 7.0, 'sign2', False, False],
[None, None, None, None, None],
[11.0, 2.0, 'sign3', False, False]], dtype=object)
ε- (´∀ ` *) Hot
I have to remember how to write this ... (..) φdf.replace ({np.nan: None})
For the time being, the official pandas documentation also mentions this. However, it took a long time to find it, so I decided to record it this time.
When value=None and to_replace is a scalar, list or tuple, replace uses the method parameter (default ‘pad’) to do the replacement. So this is why the ‘a’ values are being replaced by 10 in rows 1 and 2 and ‘b’ in row 4 in this case. The command s.replace('a', None) is actually equivalent to s.replace(to_replace='a', value=None, method='pad'):
-Excerpt from pandas.DataFrame.replace
If it was written in Japanese, I might have noticed it a little earlier ...
I don't know if it's a bit related, but if you try to fill None
with np.nan
, another problem seems to occur.
StackOverflow : Replace None with NaN in pandas dataframe
Recommended Posts