This article was written by the author at https://www.udemy.com/share/1013lqB0AedFdUR34=0 The purpose is to review what I learned in
--Light derivation of least squares --Convert from pandas DataFrame to numpy array and try to calculate
Suppose you are given a dataset [x, y]. x is the explanatory variable and y is the objective variable. For example, if you increase your height, you will gain weight, so in this case x = height and y = weight.
And I want to predict y from the given data x. Let the predicted value at that time be $ \ hat {y} $, and assume the following relational expression.
\hat{y} = ax + b
Here, the goal is to make $ \ hat {y} $ as close as possible to the correct answer value $ y $. Therefore
Error = y - \hat{y} = y - ax + b =0
It is important to find a and b that are. Please see the following links for the following explanations. http://arduinopid.web.fc2.com/P7.html
First import the module
python
import pandas as pd
from pandas import Series, DataFrame
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline
Also download the dataset used this time
python
from sklearn.datasets import load_boston
boston = load_boston()
This time, we will use RM (average number of rooms per dwelling) and target (Price) in this data frame. Originally, we use sns.pairplot and sns.jointplot to search for variables that are likely to have a linear regression (proportional) relationship, but this time we will assume that these two variables have a proportional relationship in advance.
python
boston_df = DataFrame(boston.data)
#Give a column name
boston_df.columns = boston.feature_names
#Copy a new column because it is difficult to understand with target
boston_df['Price'] = boston.target
#Scatter plot and regression line display
sns.lmplot('RM', 'Price', data=boston_df)
Let's calculate this regression line. Use np.linalg.lstsq (X, Y). However, since this X requires an array with a specific shape, it is molded for that purpose.
python
X = boston_df.RM
Y = boston_df.Price
#[x,1]In the shape of
X = np.array([ [value[0], 1] for value in X])
#Convert to floating point type
X = X.astype(np.float64)
#a,Each predicted value is stored in b
a, b = np.linalg.lstsq(X, Y)[0]
This is the end of the calculation. Let's see the result
python
plt.plot(boston_df.RM, boston_df.Price, 'o')
x = boston_df.RM
plt.plot(x, a*x+b, 'r')
Click here for official documentation https://numpy.org/doc/stable/reference/generated/numpy.linalg.lstsq.html#numpy.linalg.lstsq
numpy.linalg.lstsq(a, b, rcond='warn')
--Parameters --Coefficient matrix a (M, N), independent variable b (M,) or (M, K), rcond
If np.linalg.lstsq (X, Y) [1], the total residuals can be taken out.
Recommended Posts