Here is the standard price ` ## Write the data position you want to get in XPath XPath is a format for representing the location of arbitrary content in an HTML / XML document. Right click in Chrome-> Copy-> Copy Xpath You can get the XPath with. (See below)

Easy to get XPath of any node with Chrome only Maybe revolution

This time, I want all the td elements of the table under the div element of id = main, so I did as follows. //*[@id="main"]/div/table//td

Get HTML with parser () and extract required elements with XPath

From here on, we'll do it in Python. Pass the url to lxml.html.parser () to get the HTML_Elements and extract the elements specified by XPath from it. Finally, arrange the model and output it as a list of [Date, base price, total net assets] model. The date was finally a string of type yyyymmdd.

`getNAV.py`


# -*- coding: utf-8 -*-
# python 2.7
import lxml.html
import datetime

def getNAV(fundcode, sy, sm, sd, ey, em, ed):
    #Push the argument into the dict
    d = dict(fundcode=fundcode, sy=sy, sm=sm, sd=sd, ey=ey, em=em, ed=ed)

    #Unpack dict to generate URL
    url = 'http://info.finance.yahoo.co.jp/history/?code={fundcode} \
        &sy={sy}&sm={sm}&sd={sd}&ey={ey}&em={em}&ed={ed}&tm=d'.format(**d)

    #Get ElementTree
    tree = lxml.html.parse(url)

    #date,Base price,Apply map and utf while getting all the elements of net worth-8 conversion and comma removal
    contents = map(lambda html: html.text.encode('utf-8').replace(',',''), tree.xpath('//*[@id="main"]/div/table//td'))

    #Because it is one list[[date, price, cap], [date, price, cap], ...]Divide with
    res = []
    for i in range(0, len(contents)-1, 3):
        date = datetime.datetime.strptime(contents[i], '%Y year%m month%d day').strftime('%Y%m%d')
        price = int(contents[i+1])
        cap = int(contents[i+2])
        res.append([date, price, cap])

    return res

if __name__ == '__main__':
    #Push parameters into dict
    args = dict(fundcode='64311104', sy='2015', sm='12', sd='1', ey='2015', em='12', ed='20')
    #Pass the dict and unpack
    print getNAV(**args)

Referenced articles

lxml - Processing XML and HTML with Python Tips for scraping with lxml [Python] Scraping notes with lxml

Scraping with Python-Getting the base price of mutual funds from Yahoo! Finance

About this article

environment

change history

procedure

Check the position of the data you want to get

Get HTML with parser () and extract required elements with XPath

`getNAV.py`

Referenced articles