Python scikit-learn A collection of predictive model tips often used in the field

conditions
1. 1. Data, features

・ Uses stock price data for one year of 2019 for a certain entertainment stock ・ Uses the Nikkei 225 inverse index for the same period ・ Does not mention the validation method whether it is the optimum combination of features.

2. model

・ For the purpose of implementation method, do not pursue parameter tuning for evaluation indexes such as insufficient learning, overfitting, and accuracy of predicted values.

Linear regression

    1. Simple regression See the correlation between the inverse index and stock prices
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression
npArray = np.loadtxt("stock.csv", delimiter = ",", dtype = "float",skiprows=1)

#Feature (inverse index)
z = npArray[:,1:2]
z = npArray[:,2:3]
#Forecast data (stock price)
y = npArray[:, 3:4].ravel()

#Simple regression model creation
model = LinearRegression()

#Training
model.fit(z,y)
print('Tilt:', model.coef_)
print('Intercept:', model.intercept_)

#Forecast
y_pred = model.predict(z)

#Scatter plot of INDEX and stock prices and plot of linear function
plt.figure(figsize=(8,4))
plt.scatter(z,y, color='blue', label='Stock price')
plt.plot(z,y_pred, color='green', linestyle='-', label='LinearRegression')

#Volume plot
plt.ylabel('Closing price')
plt.xlabel('Volume')
plt.title('Regression Analysis')
plt.legend(loc='lower right')

Tilt: [-2.27391593] Intercept: 4795.89427740762 It can be seen that the inverse index and the stock price have a negative correlation and are not linked. image.png Next, let's look at the correlation between volume and stock price. This time, it became a positive correlation, and it can be read that the upward trend was almost stable throughout the year. image.png 2. Multiple regression See the stock price MSE (Mean Squared Error) and the residual stock price from the inverse index and volume

import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
npArray = np.loadtxt("stock.csv", delimiter = ",", dtype = "float",skiprows=1)
#Features (inverse index & volume)
x = npArray[:,1:3]
#Forecast data (stock price)
y = npArray[:, 3:4].ravel()
#Multiple regression model creation
model = LinearRegression()
#Feature standardization
sc = StandardScaler()

#Training data (INDEX,Training a converter for standardization of volume)
x_train_std = sc.fit_transform(x_train)
#Test data (INDEX) with a converter trained with training data,Volume) is standardized
x_test_std = sc.transform(x_test)

#Model learning with training data
model.fit(x_train_std, y_train)

#Predict stock prices with training data and test data
y_train_prd = model.predict(x_train_std)
y_test_prd = model.predict(x_test_std)

#Calculate MSE of actual stock price and forecast stock price
np.mean((y_train - y_train_prd) ** 2)
np.mean((y_test - y_test_prd) ** 2)

# MSE(Calculation of mean squared error)
print('MSE ', mean_squared_error(y_train, y_train_prd),mean_squared_error(y_test, y_test_prd))

#Plot of forecast stock price residuals (forecast-correct answer)
plt.figure(figsize=(7,5)) 
plt.scatter(y_train_prd,  y_train_prd - y_train,
c='orange', marker='s', edgecolor='white',
label='Training')
plt.scatter(y_test_prd,  y_test_prd - y_test,
c='blue', marker='s', edgecolor='white',
label='Test')

plt.xlabel('Stock price')
plt.ylabel('Residual error')
plt.legend(loc='upper left')
plt.hlines(y=0, xmin=0, xmax=1200, color='green', ls='dashed',lw=2)
plt.xlim([220,1200])
plt.tight_layout()
plt.show()

The least squares average is training data = 17349.4, test data 23046.2 As mentioned above, the inverse index and the stock price have a negative correlation, so the MSE value is high and the difference from the training data is large. image.png In the first half of 2019, when the stock price is around 300 yen, the residual is relatively small, but it shows a negative correlation with the inverse index, and since it exceeds 500 yen, there is a large variation and there is a large error with the training data. It turns out that it cannot be a forecast of stock prices

Continue

Recommended Posts

Python scikit-learn A collection of predictive model tips often used in the field
Python scikit-learn A collection of predictive model tips often used in the field
A collection of Numpy, Pandas Tips that are often used in the field
A collection of code often used in personal Python
A collection of Excel operations often used in Python
Test & Debug Tips: Create a file of the specified size in Python
Get the caller of a function in Python
Make a copy of the list in Python
Output in the form of a python array
A memorandum of method often used in machine learning using scikit-learn (for beginners)
A collection of commands frequently used in server management
A reminder about the implementation of recommendations in Python
Find out the apparent width of a string in python
Commands often used in the development environment during Python implementation
The story of a Django model field disappearing from a class
Get the number of specific elements in a python list
[Note] Import of a file in the parent directory in Python
[Tips] Problems and solutions in the development of python + kivy
Find the eigenvalues of a real symmetric matrix in Python
Can be used with AtCoder! A collection of techniques for drawing short code in Python!
[Python] A memo of frequently used phrases (by myself) in Python scripts
How to determine the existence of a selenium element in Python
How to check the memory size of a variable in Python
Read the standard output of a subprocess line by line in Python
A timer (ticker) that can be used in the field (can be used anywhere)
How to check the memory size of a dictionary in Python
A function that measures the processing time of a method in python
Get the number of readers of a treatise on Mendeley in Python
Generate a first class collection in Python
Get a capture of the entire web page in Selenium Python VBA
If you want a singleton in python, think of the module as a singleton
Summary of methods often used in pandas
Write the test in a python docstring
Display a list of alphabets in Python 3
Use a scikit-learn model trained in PySpark
Check the in-memory bytes of a floating point number float in Python
Sum of variables in a mathematical model
Run the Python interpreter in a script
The result of installing python in Anaconda
[python] [meta] Is the type of python a type?
The basics of running NoxPlayer in Python
Receive a list of the results of parallel processing in Python with starmap
In search of the fastest FizzBuzz in Python
[Introduction to Python] Thorough explanation of the character string type used in Python!
The story of blackjack A processing (python)
Get a datetime instance at any time of the day in Python
I made a program to check the size of a file in Python
How to get the "name" of a field whose value is limited by the choice attribute in Django's model
Various ways to read the last line of a csv file in Python
How to pass the execution result of a shell command in a list in Python
How to get a list of files in the same directory with python
Output the number of CPU cores in Python
Draw a graph of a quadratic function in Python
[Python] Get the files in a folder with Python
[Python] Sort the list of pathlib.Path in natural sort
Match the distribution of each group in Python
Why the Python implementation of ISUCON 5 used Bottle
Techniques often used in python short coding (Notepad)
View the result of geometry processing in Python
Find the number of days in a month
Rewriting elements in a loop of lists (Python)