This article is the 22nd day article of SFC Advent Calendar 2019. I was wondering what to write about, but this time I decided to rewrite the code I wrote recently and make it an article.
In recent years, with the development of various software and libraries, stock price data has come to be used by individuals. However, the stock price data of Japanese stocks is difficult to handle due to copyright law issues (see here for details), so a large amount of stock price data is currently available. There is no service that allows you to download all of them for free.
However, there are some services that allow you to download daily data for each issue for one year. For example, Stock Investment Memo and Buffet Code.
These services are very useful when you want to analyze stock prices for one year, but they are a little inconvenient because you have to process a large number of files when you want to analyze stock prices for multiple years.
Therefore, this time, I will make it possible to use stock price data for multiple years by creating a code that combines multiple CSVs downloaded from Stock Investment Memo. Let's do it. Also, by using the DataFrame of Pandas, which is a Python library, it is designed to be easy to use when performing analysis with Python as it is.
If you read the above, you may be wondering, "Is there a code for the download part?" In that regard, of course, it is possible to automate, but since it is unclear whether scraping and automation are permitted on the stock investment memo's site, I will not introduce the code in this article.
Instead, Longing for freelance is introduced this code You can fully automate the download by using.
In addition, there are two points that need to be changed at that time, so they are described below.
It is recommended to replace download_dir ='C: \\ Python \\'
with download_dir ='./csv'
. By doing this, you can place the downloaded file in the folder directly under the current directory.
If you want to get the data up to 2019 where it is for i in range (year, 2019):
, get the data up to for i in range (year, 2020):
, 2020 If you want to, you need to increase it by one year like for i in range (year, 2021):
.
First of all, download the stock price data of a specific stock from Stock Investment Memo for multiple years using the method described above and save it.
-Be sure to take time to download __data and be careful not to put a load on the server. __ ・ If there is a leak in __ data, an error will occur, so be sure to download all the data for the required number of years. __ -Do not change the file name. __ - Collect all files in one folder, and do not put data of multiple brands in the same folder. __
Create the code to read the downloaded csv.
import pandas as pd
import codecs
def read_csv(file_name: str) -> pd.core.frame.DataFrame:
'''
kabuoji3.File name of csv downloaded from com[file_name]Specify and read, format and DataFrame[df]Returns as.
'''
with codecs.open(file_name, 'r', 'Shift_JIS', 'ignore') as f:
df = pd.read_csv(f)
df.columns = df.iloc[0]
df.drop(df.index[[0,1]],inplace=True)
df.index = pd.to_datetime(df.index, format='%Y-%m-%d')
df.index.name = "date"
return df
The CSV character code that can be downloaded in the stock investment memo is Shift_JIS
, so if you read the CSV in the usual way, garbled characters may occur. To prevent this, here, codecs
is used to specify the character code to open the file, and pandas
function read_csv ()
is used to read the CSV as DataFrame
.
In addition, the CSV file downloaded in the stock investment memo contains brand information in the header part, and columns
and ʻindexare not recognized correctly, so it includes the process of deleting the header after directly specifying these. I will. Furthermore, by performing
to_datetime processing on the
date column of ʻindex
, the date of ʻindex can be treated as it is as
datetimetype. The final output is the
DataFrame of
pandas`, which contains the stock price data for one year.
Create a list of CSV filenames in a single folder, as well as recognize and retrieve how many years stock price data each file contains.
from glob import glob
def get_price_data_by_year(year: int) -> pd.core.frame.DataFrame:
'''
File list in the folder[FILES_DICT]Designated year from[year]Refer to the file name of the stock price csv of, read and DataFrame[df]Returns as.
'''
file_name = FILES_DICT[str(year)]
df = read_csv(file_name)
return df
if __name__ == "__main__":
#Specify the directory of the folder containing the downloaded CSV (relative path or absolute path is acceptable, after the folder name/*Don't forget)
CSV_FOLDER_DIRECTORY = './csv/*'
#CSV above_FOLDER_CSV file dictionary FILES based on DIRECTORY_Part that automatically creates DICT (no editing required)
FILES_DICT = {}
files = glob(CSV_FOLDER_DIRECTORY)
files.sort()
for file_name in files:
FILES_DICT[file_name[-8:-4]] = file_name
Specify the folder where CSV is collected in the part of CSV_FOLDER_DIRECTORY ='./csv/*'
. You can specify either a relative path or an absolute path, but be sure to add / * after the folder name.
In the rest of the process, we will create a CSV file dictionary FILES_DICT
. This should be defined as a global variable so that it can be referenced from within the function.
The CSV that can be downloaded in the stock investment memo is saved with a name such as 7203_2012.csv
. By extracting the 2012
(years) part from this, creating aDictionary
with the key
and the file name value
, a CSV file containing stock price data for a specific year You will be able to easily refer to the name.
Stock price data for several years from the specified year to the specified year are read in order, combined, and output as one data.
def create_historical_data(open: int,last: int) -> pd.core.frame.DataFrame:
'''
Designated year[open]Designated year from[last]Read stock price data up to and combine them into one DataFrame[df]Returns as.
'''
df = get_price_data_by_year(open)
for i in range(int(open) + 1,int(last) + 1):
df = pd.concat([df, get_price_data_by_year(i)])
return df
Stock price data from the specified year ʻopen to the specified year
lastare read in order, and the obtained
DataFrame is combined using
pd.concat () . Finally, a
DataFrame` with all the data combined is output. If the file of the specified year does not exist, an error will occur. Please make sure that you have downloaded the data for the specified number of years in advance before executing.
Creates stock price data for the specified number of years retroactively from the execution date and outputs it as one data. This is output in units of dates, but if the date that is exactly the specified number of years retroactively is not the business day of the exchange, the DataFrame
containing the stock price data from the business day immediately after that to the present is output. Will be done. Similarly, if the execution date is not the business day of the exchange, a DataFrame
containing the data up to the previous business day will be output.
import datetime as dt
from dateutil import relativedelta
def create_historical_data_by_date(years: int) -> pd.core.frame.DataFrame:
'''
Just the specified number of years from the execution date[years]Create minute stock price data and one DataFrame[df]Returns as.
'''
this_year = int(dt.datetime.now().year)
df = create_historical_data(this_year - years,this_year)
open = dt.datetime.now() - relativedelta.relativedelta(years=years)
df = df[df.index >= open]
return df
First, use datetime
to get the current year, and then usecreate_historical_data ()
above to get the stock price data for the specified number of years years
.
Next, using relativedelta
, we get the date that is exactly the specified number of years from the present in datetime
type. Since the index
of the DataFrame
obtained in the previous step has already been converted to the datetime
type, you can easily filter the DataFrame
by using the comparison operator > =
. You can retrieve data for the specified number of years. The final output is a DataFrame
that contains stock price data for the specified number of years.
I was able to output the data I wanted to use in the above section, but in order to handle this data externally, I need to save it in a file. Here, as an example, we will introduce how to save to CSV.
if __name__ == "__main__":
#When you want to save stock price data from the specified year to the specified year
df = create_historical_data(2015,2019)
df.to_csv("2015-2019.csv")
#When you want to save stock price data for the specified number of years retroactively from the execution date
df = create_historical_data_by_date(5)
df.to_csv("5years_price.csv")
First, use the functions such as create_historical_data ()
and create_historical_data_by_date ()
created in the above section, and save the obtained result in the variable df
. This df
can be easily exported as CSV usingto_csv ()
, which is a function of pandas
. At this time, it is necessary to specify the file name to be saved, and pass the character string enclosed in " "
as the argument of to_csv ()
. Be sure to include .csv
at the end of this.
The character code of the file saved at this time is ʻUTF-8, and the delimiter is
,. When importing with Excel, select
Data> Text File` and specify the character code and delimiter.
The code used this time is published on GitHub. If you are interested, please check here [https://github.com/shota4/jp_stock_price_data/blob/master/edit_price_data.py).
Until the end Thank you for reading. We hope that your stock analysis and Christmas will be enriched.
Recommended Posts