Robust linear regression with scikit-learn


Introducing how to draw robust linear regression using Python's machine learning library sckit-learn. In this article, I created a chart object with python's drawing library altair and [Streamlit]( Display it on the browser using the application framework ota / items / a18f158389f1585a9aa0).

Features of robust linear regression

It is less susceptible to outliers than linear regression using the least squares method.

Creating a robust linear regression

Create a robust regression line using HuberRegressor. Note that streamlit is run with streamlit run

import streamlit as st
import numpy as np
import pandas as pd
import altair as alt
from sklearn.linear_model import HuberRegressor
from sklearn.datasets import make_regression

#Demo data generation

rng = np.random.RandomState(0)
x, y, coef = make_regression( n_samples=200, n_features=1, noise=4.0, coef=True, random_state=0)
x[:4] = rng.uniform(10, 20, (4, 1))
y[:4] = rng.uniform(10, 20, 4)
df = pd.DataFrame({
    'x_axis': x.reshape(-1,),
    'y_axis': y

#Set parameters for robust regression

epsilon = st.slider('Select epsilon', 
          min_value=1.00, max_value=10.00, step=0.01, value=1.35)

#Robust regression execution

huber = HuberRegressor(epsilon=epsilon

#Scatter plot generation

plot = alt.Chart(df).mark_circle(size=40).encode(
    tooltip=['x_axis', 'y_axis']

#Get the coefficients of robust linear regression

a1 = huber.coef_[0]
b1 = huber.intercept_

#Specify the domain of the regression line

x_min = df['x_axis'].min()
x_max = df['x_axis'].max()

#Creating a regression line

points = pd.DataFrame({
    'x_axis': [x_min, x_max],
    'y_axis': [a1*x_min+b1, a1*x_max+b1],

line = alt.Chart(points).mark_line(color='steelblue').encode(

#Graph display


About parameters

Epsilon is a real number greater than or equal to 1 and represents the degree of influence of outliers. Default is set to 1.35. スクリーンショット 2020-10-17 12.14.39.png

The larger the Epsilon, the greater the effect of outliers. (The image is `ʻepsilon = 10``) スクリーンショット 2020-10-17 12.16.33.png

Creating a linear regression line by the least squares method

Replacing HuberRegressor with LinearRegression allows you to create a linear regression line using the least squares method.

Recommended Posts

Robust linear regression with scikit-learn
[Python] Linear regression with scikit-learn
Linear regression with statsmodels
Regression with linear model
Linear regression
Linear regression with Student's t distribution
Isomap with Scikit-learn
Linear regression in Python (statmodels, scikit-learn, PyMC3)
Online Linear Regression in Python (Robust Estimate)
Clustering with scikit-learn (1)
Clustering with scikit-learn (2)
PCA with Scikit-learn
kmeans ++ with scikit-learn
Predict hot summers with a linear regression model
Multivariable regression model with scikit-learn --SVR comparison verification
PCA with Scikit-learn
Background / moving object separation using dynamic mode decomposition
Moving average with numpy
Robust linear regression with scikit-learn
Multi-class SVM with scikit-learn
Clustering with scikit-learn + DBSCAN
Machine learning linear regression
Linear Programming with PuLP
DBSCAN (clustering) with scikit-learn
Regression analysis with NumPy
Try regression with TensorFlow
Install scikit.learn with pip
Calculate tf-idf with scikit-learn
Python Scikit-learn Linear Regression Analysis Nonlinear Simple Regression Analysis Machine Learning
Getting Started with Tensorflow-About Linear Regression Hypothesis and Cost
Solving the iris problem with scikit-learn ver1.0 (logistic regression)
Kernel regression with Numpy only
Machine Learning: Supervised --Linear Regression
Multiple regression analysis with Keras
Neural network with Python (scikit-learn)
Ridge regression with Pyspark's Mllib
Parallel processing with Parallel of scikit-learn
Linear regression method using Numpy
Online linear regression in Python
Classification / regression by stacking (scikit-learn)
Try to implement linear regression using Pytorch with Google Colaboratory
Implementing logistic regression with NumPy
[Machine learning] Understanding linear simple regression from both scikit-learn and mathematics
Introduction to Bayesian Statistical Modeling with python ~ Trying Linear Regression with MCMC ~
[Machine learning] Understanding linear multiple regression from both scikit-learn and mathematics
Grid search of hyperparameters with Scikit-learn
Creating a decision tree with scikit-learn
Image segmentation with scikit-image and scikit-learn
Machine learning beginners try linear regression
Identify outliers with RandomForestClassifier in scikit-learn
[Translation] scikit-learn 0.18 User Guide 1.15. Isotonic regression
Standardize non-normal distribution with robust Z-score
Non-negative Matrix Factorization (NMF) with scikit-learn
Scikit-learn DecisionTreeClassifier with datetime type values
Logistic regression analysis Self-made with python
Linear regression (for beginners) -Code edition-
Sine wave prediction (regression) with Pytorch
(Machine learning) I tried to understand Bayesian linear regression carefully with implementation.