Python scikit-learn A collection of predictive model tips often used in the field

conditions

1. 1. Data, features

・ Uses stock price data for one year of 2019 for a certain entertainment stock ・ Uses the Nikkei 225 inverse index for the same period ・ Does not mention the validation method whether it is the optimum combination of features.

2. model

・ For the purpose of implementation method, do not pursue parameter tuning for evaluation indexes such as insufficient learning, overfitting, and accuracy of predicted values.

Linear regression

1. Simple regression See the correlation between the inverse index and stock prices

import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression
npArray = np.loadtxt("stock.csv", delimiter = ",", dtype = "float",skiprows=1)

#Feature (inverse index)
z = npArray[:,1:2]
z = npArray[:,2:3]
#Forecast data (stock price)
y = npArray[:, 3:4].ravel()

#Simple regression model creation
model = LinearRegression()

#Training
model.fit(z,y)
print('Tilt:', model.coef_)
print('Intercept:', model.intercept_)

#Forecast
y_pred = model.predict(z)

#Scatter plot of INDEX and stock prices and plot of linear function
plt.figure(figsize=(8,4))
plt.scatter(z,y, color='blue', label='Stock price')
plt.plot(z,y_pred, color='green', linestyle='-', label='LinearRegression')

#Volume plot
plt.ylabel('Closing price')
plt.xlabel('Volume')
plt.title('Regression Analysis')
plt.legend(loc='lower right')

Tilt: [-2.27391593] Intercept: 4795.89427740762 It can be seen that the inverse index and the stock price have a negative correlation and are not linked. Next, let's look at the correlation between volume and stock price. This time, it became a positive correlation, and it can be read that the upward trend was almost stable throughout the year. 2. Multiple regression See the stock price MSE (Mean Squared Error) and the residual stock price from the inverse index and volume

import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
npArray = np.loadtxt("stock.csv", delimiter = ",", dtype = "float",skiprows=1)
#Features (inverse index & volume)
x = npArray[:,1:3]
#Forecast data (stock price)
y = npArray[:, 3:4].ravel()
#Multiple regression model creation
model = LinearRegression()
#Feature standardization
sc = StandardScaler()

#Training data (INDEX,Training a converter for standardization of volume)
x_train_std = sc.fit_transform(x_train)
#Test data (INDEX) with a converter trained with training data,Volume) is standardized
x_test_std = sc.transform(x_test)

#Model learning with training data
model.fit(x_train_std, y_train)

#Predict stock prices with training data and test data
y_train_prd = model.predict(x_train_std)
y_test_prd = model.predict(x_test_std)

#Calculate MSE of actual stock price and forecast stock price
np.mean((y_train - y_train_prd) ** 2)
np.mean((y_test - y_test_prd) ** 2)

# MSE(Calculation of mean squared error)
print('MSE ', mean_squared_error(y_train, y_train_prd),mean_squared_error(y_test, y_test_prd))

#Plot of forecast stock price residuals (forecast-correct answer)
plt.figure(figsize=(7,5)) 
plt.scatter(y_train_prd,  y_train_prd - y_train,
c='orange', marker='s', edgecolor='white',
label='Training')
plt.scatter(y_test_prd,  y_test_prd - y_test,
c='blue', marker='s', edgecolor='white',
label='Test')

plt.xlabel('Stock price')
plt.ylabel('Residual error')
plt.legend(loc='upper left')
plt.hlines(y=0, xmin=0, xmax=1200, color='green', ls='dashed',lw=2)
plt.xlim([220,1200])
plt.tight_layout()
plt.show()

The least squares average is training data = 17349.4, test data 23046.2 As mentioned above, the inverse index and the stock price have a negative correlation, so the MSE value is high and the difference from the training data is large. In the first half of 2019, when the stock price is around 300 yen, the residual is relatively small, but it shows a negative correlation with the inverse index, and since it exceeds 500 yen, there is a large variation and there is a large error with the training data. It turns out that it cannot be a forecast of stock prices

Continue