Previously, the Pandas Series couldn't handle integer types with missing values.
pd.Series([1, 2, None], dtype=int)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
When reading numeric data including missing values without specifying the type, it is cast to float64 type.
pd.Series([1, 2, None])
0 1.0
1 2.0
2 NaN
dtype: float64
This behavior comes from the reason "because numpy.nan
is a float type value ", but we want to handle missing values and don't have to be numpy.nan
separately.
In response, Pandas v0.24.0 added Nullable integer data type. It seems that this problem was addressed by introducing a new pandas.NA
instead of numpy.nan
.
pd.Series([1, 2, None], dtype=pd.Int64Dtype())
0 1
1 2
2 <NA>
dtype: Int64
The value specified for dtype works the same with the string " Int64 "
instead of pd.Int64Dtype ()
. (Note that ʻI` is uppercase.)
Also in the documentation
IntegerArray is currently experimental.
As it is written, this function is still in the experimental stage, so you need to be careful when using it.
Recommended Posts