I wanted to analyze the stock price and find the stock at the time of purchase, so I scraped it from the Nikkei newspaper site. There are various sites that offer stock prices for free, such as Stock Investment Memo, but they are often updated irregularly. The Nikkei newspaper site is updated daily.
First of all, when scraping a site, if you do not follow various rules, it will be a crime, but the Nikkei newspaper [robots.txt]( As far as https://www.nikkei.com/robots.txt) and Terms of Service are seen, there seems to be no problem within the scope of personal use (? )is. (Please let me know if it doesn't work)
If you use the method read_html () of the module called pandas, it will take seconds.
nikkei_scrape.py
import pandas as pd
def get_stock_prices(stock_number):
url = "https://www.nikkei.com/nkd/company/history/dprice/?scode={}&ba=1".format(stock_number)
headers = {
"User-Agent": "User-Agent information"
}
dfs = pd.read_html(url)
for i in range(len(dfs)):
if "date" in str(dfs[i]):
return dfs[i]
return False
User-Agent information is okay if you copy and paste the character string that appears when you open this site. For example, in my case, it was Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome / 81.0.4044.122 Safari / 537.36
. For more information, click here [https://qiita.com/nightyknite/items/b2590a69f2e0135756dc).
Enter the 4-digit stock code in stock_number
. For example, if you enter 1301, it is the URL of the stock price page of the company Kyokuyo [https://www.nikkei.com/nkd/company/history/dprice/?scode=1301&ba=1](https://www.nikkei. Scraping com / nkd / company / history / dprice /? Scode = 1301 & ba = 1). You can easily find the brand code by google, and you can also find it on the TSE website. You can download the EXCEL file.
There may be an extra table in the stock price web page, so to get only the stock price table, I turn the for statement to return only the table that contains the word "date".
that's all.
Recommended Posts