There was a story about the job of operating a real estate system as a service, and that there is no loss in doing hands-on at the field level. That's why we decided to challenge the famous "House Price" problem of kaggle together. And I decided to post the contents that I read line by line to qiita because it will probably be useful later if I write it down properly. It's more of a memo than a commentary, but I hope it helps someone somewhere.
I will explain each library one by one when I used it in my work, so I copied this as a spell once.
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import (
LinearRegression,
Ridge,
Lasso
)
%matplotlib inline
The actual work starts from here. First, read and format the CSV file to be used. For the time being, copy this. Explain one by one.
#Data reading
train = pd.read_csv('train.csv') #Training data
test = pd.read_csv('test.csv') #test data
#Merge training data and test data
train['WhatIsData'] = 'Train'
test['WhatIsData'] = 'Test'
test['SalePrice'] = 9999999999
alldata = pd.concat([train,test],axis=0).reset_index(drop=True)
print('The size of train is : ' + str(train.shape))
print('The size of test is : ' + str(test.shape))
Applicable source: train = pd.read_csv ('train.csv') #training data
Description: Using the pandas imported by "import pandas as pd", import the CSV file and store it in the variable "train". As a personal interpretation, pandas is an iron plate library used to easily process data spreadsheets.
Reference: https://dividable.net/programming/python-pandas/
train ['WhatIsData'] ='Train'
print ('The size of train is:' + str (train.shape))
That's all for today. I'll use it one hour a week to put it together, so it's a turtle-like speed, but thank you for your patronage.