This is the 5th project to make a note of the hands-on content that everyone will challenge to the famous "House Price" problem of kaggle. It's more of a memo than a commentary, but I hope it helps someone somewhere. I want to think that the end is about to be seen.
It's like replacing character strings with numbers.
#List features of categorical variables
cat_cols = alldata.dtypes[alldata.dtypes=='object'].index.tolist()
#List the features of numerical variables
num_cols = alldata.dtypes[alldata.dtypes!='object'].index.tolist()
#List columns required for data splitting and submission
other_cols = ['Id','WhatIsData']
#Remove extra elements from the list
cat_cols.remove('WhatIsData') #Training data / test data distinction flag removal
num_cols.remove('Id') #Id delete
#Dummy categorical variables
alldata_cat = pd.get_dummies(alldata[cat_cols])
#Data integration
all_data = pd.concat([alldata[other_cols],alldata[num_cols],alldata_cat],axis=1)
Oh, I think I'm piled up. The mysterious response. Then, I would like to output only the following results together. Only the object type data type has the index in the list.
cat_cols = alldata.dtypes[alldata.dtypes=='object'].index.tolist()
num_cols = alldata.dtypes[alldata.dtypes!='object'].index.tolist()
This is the same as listing the features of categorical variables, so I will omit it.
other_cols = ['Id','WhatIsData']
As you can see, the column added in Part 2 is stored in the array. Apparently this next step will be used to remove extra elements from the list.
It seems that it removes unnecessary elements from the list. You can also confirm from the previous output that there was an item called WhatIsData in cat_cols
.
cat_cols.remove ('WhatIsData') #Training data / test data distinction flag removal
num_cols.remove ('Id') #Id remove
alldata_cat = pd.get_dummies(alldata[cat_cols])
Unusual impression. It's so convenient that you can just apply it to a function and it will do everything for you ... I like this kind of python.
ʻAlldata_cat = pd.get_dummies (alldata [cat_cols])` output result. It's amazing, it's really changed.
all_data = pd.concat([alldata[other_cols],alldata[num_cols],alldata_cat],axis=1)
This is just what I saw. Combine [alldata [other_cols], alldata [num_cols], alldata_cat with concat. (I've come to say that it looks great)
Did you proceed at a good tempo this time? It seems that it is not taking much time to read and understand unexpectedly. It feels like you're getting used to it. I will continue to devote myself. Now that the data has been formatted, it's time to analyze it. I'm looking forward to it.