This is the 7th project to make a note of the contents of hands-on, which will challenge everyone to the famous "House Price" problem of kaggle. It's more of a memo than a commentary, but I hope it helps someone somewhere. The preparation was completed last time, and it is finally in the analysis stage.
#Divide the merged data into training data and test data
train_ = all_data[all_data['WhatIsData']=='Train'].drop(['WhatIsData','Id'], axis=1).reset_index(drop=True)
test_ = all_data[all_data['WhatIsData']=='Test'].drop(['WhatIsData','SalePrice'], axis=1).reset_index(drop=True)
#Division within training data
train_x = train_.drop('SalePrice',axis=1)
train_y = np.log(train_['SalePrice'])
#Split in test data
test_id = test_['Id']
test_data = test_.drop('Id',axis=1)
Check on the train side.
all_data[all_data['WhatIsData']=='Train'].drop(['WhatIsData','Id'], axis=1).reset_index(drop=True)
First, check the contents of ʻall_data [all_data ['WhatIsData'] =='Train']`. Only the Train in all_data is fetched.
ʻAll_data [all_data ['WhatIsData'] =='Train']. Drop (['WhatIsData','Id'], axis = 1)` Check the contents WhatIsData, Id is dropped from the column.
ʻAll_data [all_data ['WhatIsData'] =='Train']. Drop (['WhatIsData','Id'], axis = 1)` Check the contents. Reset the index (If it is a captured image, you can not see it by switching once ...)
(By the way, both train and test seem to have purposely made an array before ... I thought it was necessary to review the whole picture of that.)
train_x = train_.drop('SalePrice',axis=1)
train_y = np.log(train_['SalePrice'])
With train_x = train_.drop ('SalePrice', axis = 1)
, columns other than SalePrice are used as explanatory variables.
Prepare the objective variable with train_y = np.log (train_ ['SalePrice'])
. (Don't forget the last logarithmic conversion)
test_id = test_['Id']
test_data = test_.drop('Id',axis=1)
Are you still looking at it? .. .. As expected, the confirmation of test_id and test_data is omitted here.
I thought I'd enter, but I'm getting overwhelmed by things I don't understand, so I'll do my best to prepare without entering. Mainly word search.
StandardScaler () #scaling
[0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0] #Parameter grid
make_pipeline (scaler, ls) #pipeline generation
Is it from the point of reading all this homework first? Can I say what I thought? I thought it was "the end of the game", but he said that everything he had done so far was pre-processing.
Recommended Posts