Python scikit-learn A collection of predictive model tips often used in the field

conditions

1. 1. Data, features

・ Uses stock price data for one year of 2019 for a certain entertainment stock ・ Uses the Nikkei 225 inverse index for the same period ・ Does not mention the validation method whether it is the optimum combination of features.

2. model

・ For the purpose of implementation method, do not pursue parameter tuning for evaluation indexes such as insufficient learning, overfitting, and accuracy of predicted values.

Support vector regression

1. Linear regression See the correlation between volume and stock price ・ Check the slope of the regression line and the SVR boundary line ・ Check the distribution within the margin ・ Check the mean square error of linear regression and SVR regression

import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

npArray = np.loadtxt("stock.csv", delimiter = ",", dtype = "float",skiprows=1)

#Feature value(Volume)
x = npArray[:,2:3]

#Forecast data (stock price)
y = npArray[:, 3:4].ravel()

#Divided into training data and evaluation data
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)#, random_state=0)

#Standardization of features
sc = StandardScaler()

#Training data standardized by transducer
x_train_std = sc.fit_transform(x_train)
#Standardize test data with a transducer trained with training data
x_test_std = sc.transform(x_test)

#Create a linear regression model
mod = LinearRegression()
#Create SVR model
mod2 = SVR(kernel='linear', C=10000.0, epsilon=250.0)

#Linear regression model learning
mod.fit(x_train_std, y_train)
# SVR
mod2.fit(x_train_std, y_train)

#Training data (volume) plot
plt.figure(figsize=(8,5))
#Volume sorting (0 between minimum and maximum values).1 time ndarray creation)
x_ndar = np.arange(x_train_std.min(), x_train_std.max(), 0.1)[:, np.newaxis]

#Linear regression prediction of volume
y_ndar_prd = mod.predict(x_ndar)
#Volume SVR Forecast
y_ndar_svr = mod2.predict(x_ndar)

## MSE(Mean squared error)
mse_train_lin=mod.predict(x_train_std)
mse_test_lin=mod.predict(x_test_std)
mse_train_svr= mod2.predict(x_train_std)
mse_test_svr  = mod2.predict(x_test_std)
#Linear regression MSE
print('Linear regression MSE training= %.1f,test= %.1f' % (mean_squared_error(y_train,mse_train_lin),mean_squared_error(y_test, mse_test_lin)))
#SVR MSE
print('SVRMSE training= %.1f,test=  %.1f' % (mean_squared_error(y_train,mse_train_svr),mean_squared_error(y_test, mse_test_svr)))

If you try several times without specifying random_state, the MSE of SVR is naturally small. 1st time MSE training for linear regression = 38153.4, test = 33161.9 SVR MSE training = 52439.9, test = 56707.7 Second time MSE training for linear regression = 37836.4, test = 33841.3 SVR MSE training = 54044.5, test = 51083.7 3rd time MSE training for linear regression = 37381.3, test = 35616.6 SVR MSE training = 53499.2, test = 53619.4

Let's plot this on a scatter plot below

#Scatter plot of volume and stock price
plt.scatter(x_train_std, y_train, color='blue', label='data')
#Regression line
plt.plot(x_ndar, y_ndar_prd, color='green', linestyle='-', label='LinearRegression')
#border
plt.plot(x_ndar, y_ndar_svr ,color='red', linestyle='-', label='SVR')
#Margin line
plt.plot(x_ndar, y_ndar_svr + mod2.epsilon, color='orange', linestyle='-.', label='margin')
plt.plot(x_ndar, y_ndar_svr - mod2.epsilon, color='orange', linestyle='-.')
#label
plt.ylabel('Closing price')
plt.xlabel('Volume')
plt.title('SVR Regression')
#Usage Guide
plt.legend(loc='lower right')

plt.show()

The SVR boundary is gentler than the slope of the regression line I tried to set the margin at 250 yen for epsilon, but it seems good to say that the stock price has not been noticeably thrown according to the trading volume and it is generally on an upward trend.