Summary
I wrote a script to download stock price information from Stock Investment Memo without scraping.
python ./stockDownload.py -c 7203
7203 Toyota Motor Corporation's 2019 daily data can be downloaded as csv.
If the download is successful, it returns Code: 7203 download finished.
, and if it fails, it returns Code: not valid.
.
Scraping is prohibited from Yahoo! finance. The method of scraping stock price information from Stock Investment Memo was disclosed [^ 1], but the format may be changed and parsing may not work. On the other hand, there is a download button on the site, so I was investigating whether I could make good use of it.
After pressing the download button, I analyzed it from the network tab of the google developer tool.
It seems that data is POSTed to https://kabuoji3.com/stock/file.php
.
--Since the script's fullName
is the save destination, change it as appropriate.
--If there is no header, a 403 error will occur, so check user-agent with the google developer tool. [^ 2]
--sleep (3)
is included to avoid excessive server load.
{stockDownload.py}
#!/usr/bin/env python
import requests
import re
import click
from time import sleep
@click.command()
@click.option("--code", "-c", "code", required=True,
help="Stock code to download.")
def main(code):
year = "2019"
session = requests.Session()
headers = {
"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36"
}
data = {
"code":code,
"year":year,
"csv":""
}
url = "https://kabuoji3.com/stock/file.php"
res = session.post(url, data=data, headers=headers)
try:
contentDisposition = res.headers['Content-Disposition']
fileName = re.findall(r'\"(.+?)\"', contentDisposition)[0]
fullName = ~/Documents/projects/ipo/data/stock/{}".format(fileName)
with open(fullName, "wb") as saveFile:
saveFile.write(res.content)
print("Code: {} download finished.".format(code))
except KeyError:
print("Code: {} not valid.".format(code))
sleep(3)
if __name__ == '__main__':
main()
I created a cli using click for the first time. I think it's easier to read than sys.argv.
All you have to do is use shell's cat code | while read line: do python ./stockDownload.py -c $ line; done
.
Since it is cp932 encoded, it needs to be converted as nkf.
[Python] Pseudo-click the button with requests How to Write Python Command-Line Interfaces like a Pro
[^ 1]: [Python] Get stock price data by scraping [^ 2]: [Python] What to do when scraping 403 Forbidden: You do n’t have permission to access on this server
Recommended Posts