Here is the standard price
`
## Write the data position you want to get in XPath
XPath is a format for representing the location of arbitrary content in an HTML / XML document. Right click in Chrome-> Copy-> Copy Xpath
You can get the XPath with. (See below)
Easy to get XPath of any node with Chrome only Maybe revolution
This time, I want all the td elements of the table under the div element of id = main, so I did as follows.
//*[@id="main"]/div/table//td
Get HTML with parser () and extract required elements with XPath
From here on, we'll do it in Python. Pass the url to lxml.html.parser () to get the HTML_Elements and extract the elements specified by XPath from it. Finally, arrange the model and output it as a list of [Date, base price, total net assets] model. The date was finally a string of type yyyymmdd.
getNAV.py
# -*- coding: utf-8 -*-
# python 2.7
import lxml.html
import datetime
def getNAV(fundcode, sy, sm, sd, ey, em, ed):
#Push the argument into the dict
d = dict(fundcode=fundcode, sy=sy, sm=sm, sd=sd, ey=ey, em=em, ed=ed)
#Unpack dict to generate URL
url = 'http://info.finance.yahoo.co.jp/history/?code={fundcode} \
&sy={sy}&sm={sm}&sd={sd}&ey={ey}&em={em}&ed={ed}&tm=d'.format(**d)
#Get ElementTree
tree = lxml.html.parse(url)
#date,Base price,Apply map and utf while getting all the elements of net worth-8 conversion and comma removal
contents = map(lambda html: html.text.encode('utf-8').replace(',',''), tree.xpath('//*[@id="main"]/div/table//td'))
#Because it is one list[[date, price, cap], [date, price, cap], ...]Divide with
res = []
for i in range(0, len(contents)-1, 3):
date = datetime.datetime.strptime(contents[i], '%Y year%m month%d day').strftime('%Y%m%d')
price = int(contents[i+1])
cap = int(contents[i+2])
res.append([date, price, cap])
return res
if __name__ == '__main__':
#Push parameters into dict
args = dict(fundcode='64311104', sy='2015', sm='12', sd='1', ey='2015', em='12', ed='20')
#Pass the dict and unpack
print getNAV(**args)
Referenced articles
lxml - Processing XML and HTML with Python
Tips for scraping with lxml
[Python] Scraping notes with lxml
|