I challenged a competition to predict the price of a house from features such as site area and age.
import pandas as pd
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
Separate into the objective variable "Sale Price" and the other dependent variables.
train_x = train.drop(['Id', 'SalePrice'], axis=1)
train_y = train['SalePrice']
test_x = test.drop(['Id'], axis=1)
for column in train_x.columns:
labels, uniques = pd.factorize(train_x[column])
train_x[column] = labels
for column in test_x.columns:
labels, uniques = pd.factorize(test_x[column])
test_x[column] = labels
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(train_x, train_y)
pred_y = regressor.predict(test_x)
submission = pd.DataFrame({'Id':test['Id'], 'SalePrice':pred_y})
submission.to_csv('submission.csv', index=False)
Recommended Posts