Pandas-datareader is useful for getting stock price data, but unfortunately there may be data missing. For example, if you get "1357NF Nikkei Double Inverse" with Stooq,
import pandas_datareader.stooq as web
from datetime import datetime
start_date = datetime(2016,6,10)
end_date = datetime(2016,6,17)
dr = web.StooqDailyReader('1357.JP', start=start_date, end=end_date)
df = dr.read()
df.to_csv('1357.csv')
You will get a csv file like the one below.
Date,Open,High,Low,Close,Volume
2016-06-17,3330,3380,3290,3370,8019724
2016-06-16,3270,3465,3250,3450,10403857
2016-06-14,3220,3315,3185,3270,9910736
2016-06-13,3105,3205,3100,3200,8193928
2016-06-10,2981,3040,2977,3000,4247241
Data for 2016-06-15 is missing. Maybe I couldn't do it on this day due to system trouble or something? I thought, and when I checked the time series data of Yahoo! Finance.,
At Yahoo! Finance., The data for that day existed.
When I get the nikkei225 index,
import pandas_datareader.stooq as web
from datetime import datetime
start_date = datetime(2016,6,10)
end_date = datetime(2016,6,17)
dr = web.StooqDailyReader('^NKX', start=start_date, end=end_date)
df = dr.read()
df.to_csv('NIKKEI225.csv')
There was no omission.
Date,Open,High,Low,Close,Volume
2016-06-17,15631.79,15774.87,15582.94,15599.66,1671723008
2016-06-16,15871.22,15913.08,15395.98,15434.14,1542472064
2016-06-15,15799.07,15997.3,15752.01,15919.58,1367727744
2016-06-14,16001.19,16082.5,15762.09,15859.0,1316932864
2016-06-13,16319.11,16335.38,16019.18,16019.18,1261788416
2016-06-10,16637.51,16643.36,16496.11,16601.36,1549976064
The lack seems to be due to the brand.
Depending on the brand, it may or may not be missing. If nothing is done, there is a risk of making serious mistakes when comparing stock prices, so it is necessary to remove missing rows or interpolate. You can remove or fill in missing rows by merging the two tables using pandas.
import pandas as pd
nikkei225 = pd.read_csv('NIKKEI225.csv').set_index('Date').sort_index()
n1357 = pd.read_csv('1357.csv').set_index('Date').sort_index()
merged = pd.DataFrame.merge(nikkei225, n1357, on='Date', how='inner')
merged2 = pd.DataFrame.merge(nikkei225, n1357, on='Date', how='outer')
merged gives the result with the missing rows removed.
In merged2, missing lines are filled with NaN.
To interpolate the missing lines, fill them with NaN and then interpolate to the required values.
How to make up for the gap is explained in the following article. https://qiita.com/kazama0119/items/c838114f8687518ba58e I tried to predict the stock price by data analysis
Recommended Posts