I was a little addicted to dealing with NaN in pandas, so make a note.
When dealing with mixed str
and float
types of data, use pd.isnull ()
instead of math.isnan ()
or np.isnan ()
.
First, read the data.
read_csv.py
import pandas as pd
import numpy as np
import math
data = pd.read_csv('test.csv', encoding='utf-8')
data
looks like this.
hoge | foo | |
---|---|---|
0 | 0 | NaN |
1 | a | 1.0 |
2 | NaN | b |
I want to replace the NaN in column'hoge' with the string'No data'.
type.py
for i in range(len(data)):
print(type(data['hoge'][i]))
result
<class 'str'>
<class 'str'>
<class 'float'>
The result is as follows. Only NaN is of float
type.
hoge | foo | |
---|---|---|
0 | str | float |
1 | str | str |
2 | float | str |
math.isnan()
math.isnan ()
cannot be used for str
type.
math_isnan.py
for i in range(len(data)):
if math.isnan(data['hoge'][i]) == True:
data['hoge'][i] = 'No data'
result
TypeError: must be real number, not str
np.isnan()
np.isnan ()
also cannot be used for str
type.
np_isnan.py
for i in range(len(data)):
if np.isnan(data['hoge'][i]) == True:
data['hoge'][i] = 'No data'
result
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
pd.isnull()
pd.isnull ()
becomes True
for NaN.
pd_isnull.py
print(pd.isnull(data['hoge'][2]))
result
True
If you try to replace NaN with pd.isnull ()
for the column'hoge', which is a mixture of str
type and float
type, it will pass.
pd_isnull.py
for i in range(len(data)):
if pd.isnull(data['hoge'][i]) == True:
data['hoge'][i] = 'No data'
The NaN in column'hoge'has been replaced.
hoge | foo | |
---|---|---|
0 | 0 | NaN |
1 | a | 1.0 |
2 | No data | b |
Recommended Posts