This is the 4th project to make a note of the contents of hands-on, where everyone will challenge the famous "House Price" problem of kaggle. It's more of a memo than a commentary, but I hope it helps someone somewhere. The impression that it gradually accumulated when it was the 4th time.
What I did up to the last time was the one that "gets the index including the missing value as an array". (By the way, I feel like I'm confused that "python has various concepts of arrays and it's annoying")
#Complement missing values according to data type
#0 for float
#In the case of object'NA'
na_float_cols = alldata[na_col_list].dtypes[alldata[na_col_list].dtypes=='float64'].index.tolist() #float64
na_obj_cols = alldata[na_col_list].dtypes[alldata[na_col_list].dtypes=='object'].index.tolist() #object
#Substitute 0 if float64 type is missing
for na_float_col in na_float_cols:
alldata.loc[alldata[na_float_col].isnull(),na_float_col] = 0.0
#If the object type is missing'NA'Substitute
for na_obj_col in na_obj_cols:
alldata.loc[alldata[na_obj_col].isnull(),na_obj_col] = 'NA'
alldata[na_col_list].dtypes[alldata[na_col_list].dtypes=='float64'].index.tolist()
alldata[na_col_list].dtypes[alldata[na_col_list].dtypes=='object'].index.tolist()
for na_float_col in na_float_cols:
alldata.loc[alldata[na_float_col].isnull(),na_float_col] = 0.0
I will read about for now. The order of variables and objects is reversed from that written in PHP (I don't know if it's correct).
Try to output na_float_col
and ʻalldata [na_float_col]`. For the time being, let's take a look at the iron plate for checking the operation of the iterative process.
First time of na_float_col
You can see the name of "index containing missing values".
ʻAlldata [na_float_col] `first time You will see an array of "index containing missing values" values.
Result of .isnull ()
ʻIsnull ()is used to determine if each value is null. Output ʻalldata [na_float_col] .isnull ()
alldata.loc[alldata[na_float_col].isnull(),na_float_col]
Specify in a matrix and enter "0.0" only for missing values.
alldata.loc[alldata[na_float_col].isnull(),na_float_col] = 0.0
Each item is too detailed to see, but you should be able to go with this.
Output result of ʻall data`
I thought I'd do it, but the time has run out, so I'd like to finish it as a preparation for "dummy categorical variables". Is it like quantifying it so that it can be analyzed? .. .. ??
It took longer than I expected to complete the missing values. I wonder if this is a Python trap that packs everything in one line (hopefully it's not a trap or anything).
It's almost time for the actual treatment to come closer and I'm excited to smell the clothes I'm wearing.