Introducing how to draw robust linear regression using Python's machine learning library sckit-learn. In this article, I created a chart object with python's drawing library altair and [Streamlit](https://qiita.com/keisuke- Display it on the browser using the application framework ota / items / a18f158389f1585a9aa0).
It is less susceptible to outliers than linear regression using the least squares method.
Create a robust regression line using HuberRegressor.
Note that streamlit is run with streamlit run filename.py
streamlit_robust_linear.py
import streamlit as st
import numpy as np
import pandas as pd
import altair as alt
from sklearn.linear_model import HuberRegressor
from sklearn.datasets import make_regression
#Demo data generation
rng = np.random.RandomState(0)
x, y, coef = make_regression( n_samples=200, n_features=1, noise=4.0, coef=True, random_state=0)
x[:4] = rng.uniform(10, 20, (4, 1))
y[:4] = rng.uniform(10, 20, 4)
df = pd.DataFrame({
'x_axis': x.reshape(-1,),
'y_axis': y
})
#Set parameters for robust regression
epsilon = st.slider('Select epsilon',
min_value=1.00, max_value=10.00, step=0.01, value=1.35)
#Robust regression execution
huber = HuberRegressor(epsilon=epsilon
).fit(
df['x_axis'].values.reshape(-1,1),
df['y_axis'].values.reshape(-1,1)
)
#Scatter plot generation
plot = alt.Chart(df).mark_circle(size=40).encode(
x='x_axis',
y='y_axis',
tooltip=['x_axis', 'y_axis']
).properties(
width=500,
height=500
).interactive()
#Get the coefficients of robust linear regression
a1 = huber.coef_[0]
b1 = huber.intercept_
#Specify the domain of the regression line
x_min = df['x_axis'].min()
x_max = df['x_axis'].max()
#Creating a regression line
points = pd.DataFrame({
'x_axis': [x_min, x_max],
'y_axis': [a1*x_min+b1, a1*x_max+b1],
})
line = alt.Chart(points).mark_line(color='steelblue').encode(
x='x_axis',
y='y_axis'
).properties(
width=500,
height=500
).interactive()
#Graph display
st.write(plot+line)
Epsilon is a real number greater than or equal to 1 and represents the degree of influence of outliers. Default is set to 1.35.
The larger the Epsilon, the greater the effect of outliers. (The image is `ʻepsilon = 10``)
Replacing HuberRegressor with LinearRegression allows you to create a linear regression line using the least squares method.