theme

This is the 7th project to make a note of the contents of hands-on, which will challenge everyone to the famous "House Price" problem of kaggle. It's more of a memo than a commentary, but I hope it helps someone somewhere. The preparation was completed last time, and it is finally in the analysis stage.

Original theme: https://www.kaggle.com/c/house-prices-advanced-regression-techniques
Referenced article: https://yolo-kiyoshi.com/2018/12/17/post-1003/

Today's work

Building a predictive model

#Divide the merged data into training data and test data
train_ = all_data[all_data['WhatIsData']=='Train'].drop(['WhatIsData','Id'], axis=1).reset_index(drop=True)
test_ = all_data[all_data['WhatIsData']=='Test'].drop(['WhatIsData','SalePrice'], axis=1).reset_index(drop=True)
#Division within training data
train_x = train_.drop('SalePrice',axis=1)
train_y = np.log(train_['SalePrice'])
#Split in test data
test_id = test_['Id']
test_data = test_.drop('Id',axis=1)

Divide the merged data into training data and test data

Check on the train side.

all_data[all_data['WhatIsData']=='Train'].drop(['WhatIsData','Id'], axis=1).reset_index(drop=True)

First, check the contents of ʻall_data [all_data ['WhatIsData'] =='Train']`. Only the Train in all_data is fetched. スクリーンショット 2020-07-06 11.55.36.png

ʻAll_data [all_data ['WhatIsData'] =='Train']. Drop (['WhatIsData','Id'], axis = 1)` Check the contents WhatIsData, Id is dropped from the column. スクリーンショット 2020-07-06 11.56.45.png

ʻAll_data [all_data ['WhatIsData'] =='Train']. Drop (['WhatIsData','Id'], axis = 1)` Check the contents. Reset the index (If it is a captured image, you can not see it by switching once ...) スクリーンショット 2020-07-06 12.00.03.png

(By the way, both train and test seem to have purposely made an array before ... I thought it was necessary to review the whole picture of that.)

Division within training data

train_x = train_.drop('SalePrice',axis=1)
train_y = np.log(train_['SalePrice'])

With train_x = train_.drop ('SalePrice', axis = 1), columns other than SalePrice are used as explanatory variables.

Prepare the objective variable with train_y = np.log (train_ ['SalePrice']). (Don't forget the last logarithmic conversion)

Split in test data

test_id = test_['Id']
test_data = test_.drop('Id',axis=1)

Are you still looking at it? .. .. As expected, the confirmation of test_id and test_data is omitted here.

Building a predictive model

I thought I'd enter, but I'm getting overwhelmed by things I don't understand, so I'll do my best to prepare without entering. Mainly word search.

`StandardScaler () #scaling`

About scale conversion: https://aizine.ai/preprocessing0614/
About the scale conversion class of Scikit-learn: https://helve-python.hatenablog.jp/entry/scikit-learn-scale-conversion

`[0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0] #Parameter grid`

About grid search: https://www.case-k.jp/entry/2018/09/03/211016 #: ~: text =% E3% 82% B0% E3% 83% AA% E3% 83% 83 % E3% 83% 89% E3% 82% B5% E3% 83% BC% E3% 83% 81% E3% 81% A8% E3% 81% AF% E3% 80% 81% E3% 83% A2% E3 % 83% 87% E3% 83% AB,% E3% 82% 92% E6% 8E% A2% E7% B4% A2% E3% 81% 99% E3% 82% 8B% E6% 96% B9% E6% B3% 95% E3% 81% A7% E3% 81% 99% E3% 80% 82

`make_pipeline (scaler, ls) #pipeline generation`

About the pipeline: https://qiita.com/colako/items/b4f4159b77c0a87e978f

That's it.

Is it from the point of reading all this homework first? Can I say what I thought? I thought it was "the end of the game", but he said that everything he had done so far was pre-processing.

[Hands-on for beginners] Read kaggle's "Predicting Home Prices" line by line (7th: Preparing to build a prediction model)

theme

Today's work

Building a predictive model

Divide the merged data into training data and test data

Division within training data

Split in test data

Building a predictive model

StandardScaler () #scaling

[0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0] #Parameter grid

make_pipeline (scaler, ls) #pipeline generation

That's it.

`StandardScaler () #scaling`

`[0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0] #Parameter grid`

`make_pipeline (scaler, ls) #pipeline generation`