In the 4th post of Advent Calendar, I will programmatically collect the data described in XBRL disclosed on EDINET.
(The program in this article is provided as it is without any guarantee, and XBRL Japan takes all responsibility for any disadvantages or problems caused by using this program, regardless of the cause. I will not bear it.)
The EDINET API is an API that allows you to efficiently retrieve (XBRL) data from the EDINET database via a program, not from the EDINET screen. The EDINET API enables EDINET users to efficiently acquire disclosure information. Before using the API, please check the ** Terms of Service ** on the EDINET page.
This program is a Python language program that downloads the XBRL data corresponding to the securities report disclosed on EDINET during the collection period via the EDINET API. (All codes are described in "3. Source code") ** Detailed specifications of EDINET API **, please check from the EDINET site. .. Quarterly / semi-annual reports and corrected securities reports are not supported.
Please take the following actions before executing the program. In addition, it is necessary to install other libraries (requests, datetime, etc.) in advance.
Endpoints are as of December 2019. Please check each time for the latest version (ex. V1).
https://disclosure.edinet-fsa.go.jp/api/v1/documents.json
Set Proxy based on the network environment. If you don't need Proxy, remove proxies = proxies
.
"http_proxy" : "http://username:[email protected]:8080"
"https_proxy" : "https://username:[email protected]:8080"
Decide where to download the XBRL data.
C://Users//xxx//Desktop//xbrlReport//SR//
The day_list
is created by setting the collection start date and collection end date.
Code1
start_date = datetime.date(2019, 11, 1)
end_date = datetime.date(2019, 11,30)
Result1
day_list [datetime.date(2019, 11, 1), datetime.date(2019, 11, 2), datetime.date(2019, 11, 3), datetime.date(2019, 11, 4), datetime.date(2019, 11, 5), datetime.date(2019, 11, 6), datetime.date(2019, 11, 7), datetime.date(2019, 11, 8), datetime.date(2019, 11, 9), datetime.date(2019, 11, 10), datetime.date(2019, 11, 11), datetime.date(2019, 11, 12), datetime.date(2019, 11, 13), datetime.date(2019, 11, 14), datetime.date(2019, 11, 15), datetime.date(2019, 11, 16), datetime.date(2019, 11, 17), datetime.date(2019, 11, 18), datetime.date(2019, 11, 19), datetime.date(2019, 11, 20), datetime.date(2019, 11, 21), datetime.date(2019, 11, 22), datetime.date(2019, 11, 23), datetime.date(2019, 11, 24), datetime.date(2019, 11, 25), datetime.date(2019, 11, 26), datetime.date(2019, 11, 27), datetime.date(2019, 11, 28), datetime.date(2019, 11, 29), datetime.date(2019, 11, 30)]
2
Loop through day_list
, set ʻurl (endpoint),
params(date information), and optionally
proxies (Proxy information) for each date,
requests. Get res
(Response object) by executing get (url, params = params, proxies = proxies). ʻUrl
specifies the endpoint corresponding to the document list API. The reason why the list of documents to be submitted is obtained by specifying " type ": 2
in params
is to identify the securities report in the subsequent processing.
Code2
for index,day in enumerate(day_list):
url = "https://disclosure.edinet-fsa.go.jp/api/v1/documents.json"
params = {"date": day, "type": 2}
proxies = {
"http_proxy" : "http://username:[email protected]:8080/",
"https_proxy" : "https://username:[email protected]:8080/"
}
res = requests.get(url, params=params ,proxies=proxies)
The structure of res
is defined by the 2-1-2-1 Document List API (metadata) of the EDINET API specification.pdf. The following is the contents of res
corresponding to day
of 2019.11.1.
Result2
2019-11-01
{
"metadata": {
"title": "API for grasping submitted documents",
"parameter": {
"date": "2019-11-01",
"type": "2"
},
"resultset": {
"count": 315
},
"processDateTime": "2019-12-05 00:00",
"status": "200",
"message": "OK"
},
"results": [
{
"seqNumber": 1,
"docID": "S100H5LU",
"edinetCode": "E12422",
"secCode": null,
"JCN": "4010001046310",
"filerName": "Shinkin Asset Management Investment Trust Co., Ltd.",
"fundCode": "G03385",
"ordinanceCode": "030",
"formCode": "07A000",
"docTypeCode": "120",
"periodStart": "2018-08-07",
"periodEnd": "2019-08-06",
"submitDateTime": "2019-11-01 09:00",
"docDescription": "Securities Report (Domestic Investment Trust Beneficiary Securities) -17th Term(August 7, 2018-August 6, 2018-Reiwa 1)",
"issuerEdinetCode": null,
"subjectEdinetCode": null,
"subsidiaryEdinetCode": null,
"currentReportReason": null,
"parentDocID": null,
"opeDateTime": null,
"withdrawalStatus": "0",
"docInfoEditStatus": "0",
"disclosureStatus": "0",
"xbrlFlag": "1",
"pdfFlag": "1",
"attachDocFlag": "1",
"englishDocFlag": "0"
},
Since the list of documents to be submitted is managed by results
of res, loop processing is performed using results
. After that, you will get ʻordinance Code(Cabinet Office Ordinance Code) and
form_code (Form Code) for each document submitted in results. Since this time we are targeting securities reports, we have decided to process only the submitted documents with ʻordinance Code
of 010 and form_code
of 030000. Obtain the docID
(document management number) of the relevant submitted documents and store it in the securities_report_doc_list
(list of securities reports).
Code3
for num in range(len(json_data["results"])):
ordinance_code= json_data["results"][num]["ordinanceCode"]
form_code= json_data["results"][num]["formCode"]
if ordinance_code == "010" and form_code =="030000" :
securities_report_doc_list.append(json_data["results"][num]["docID"])
This created a list of docID
s corresponding to the securities report.
Result3
number_of_lists: 77
get_list: ['S100H8TT', 'S100HE9U', 'S100HC6W', 'S100HFA0', 'S100HFBC', 'S100HFB3', 'S100HG9S', 'S100HG62', 'S100HGJL', 'S100HFMG', 'S100HGM1', 'S100HGMZ', 'S100HGFM', 'S100HFC2', 'S100HGNQ', 'S100HGS3', 'S100HGYR', 'S100HGMB', 'S100HGKE', 'S100HFJG', 'S100HGTC', 'S100HH1G', 'S100HH9I', 'S100HGTF', 'S100HHAL', 'S100HHC0', 'S100HFIB', 'S100HH1I', 'S100HH36', 'S100HHDF', 'S100HH9L', 'S100HHGB', 'S100HHGJ', 'S100HHCR', 'S100HHJJ', 'S100HHH0', 'S100HHLH', 'S100HHL6', 'S100HHD4', 'S100HHM7', 'S100HHL9', 'S100HHN6', 'S100HHO8', 'S100HHHV', 'S100HHE3', 'S100HGB5', 'S100HHQ0', 'S100HHP5', 'S100HHMK', 'S100HHE6', 'S100HHPR', 'S100HHDA', 'S100HHR7', 'S100HHSB', 'S100HHML', 'S100HH9H', 'S100HH2F', 'S100H8W1', 'S100HHRP', 'S100HHTM', 'S100HHAF', 'S100HHUD', 'S100HHK9', 'S100HHT4', 'S100HHCI', 'S100HHXQ', 'S100HHO8', 'S100HHSS', 'S100HHRL', 'S100HI19', 'S100HHXS', 'S100HI1W', 'S100HHSP', 'S100HHN4', 'S100HI3J', 'S100HI3K', 'S100HI4G']
The following is the code to download the XBRL data using the list. Use securities_report_doc_list
to loop. ʻUrlspecifies the endpoint corresponding to the document acquisition API (note that it is not the document list API). By specifying
" type ": 1 in
params, it is possible to obtain the submitted document and audit report. Only when the
status code of
res` is 200 (when the request is successful), the XBRL data is downloaded. In addition, if you specify a date and time that is not covered by the EDINET period, such as 5 years ago, the status code of 404 (resource does not exist) may be returned.
Code4
for index,doc_id in enumerate(securities_report_doc_list):
url = "https://disclosure.edinet-fsa.go.jp/api/v1/documents/" + doc_id
params = {"type": 1}
filename = "C:\\Users\\XXX\\Desktop\\xbrlReport\\SR\\" + doc_id + ".zip"
res = requests.get(url, params=params ,stream=True)
if res.status_code == 200:
with open(filename, 'wb') as file:
for chunk in res.iter_content(chunk_size=1024):
file.write(chunk)
After execution, 77 zip files were downloaded to the specified folder. Unzip the zip file and you will see the familiar AuditDoc and PublicDoc folders. This completes the download of XBRL data.
# -*- coding: utf-8 -*-
import requests
import datetime
def make_day_list(start_date, end_date):
print("start_date:", start_date)
print("end_day:", end_date)
period = end_date - start_date
period = int(period.days)
day_list = []
for d in range(period):
day = start_date + datetime.timedelta(days=d)
day_list.append(day)
day_list.append(end_date)
return day_list
def make_doc_id_list(day_list):
securities_report_doc_list = []
for index, day in enumerate(day_list):
url = "https://disclosure.edinet-fsa.go.jp/api/v1/documents.json"
params = {"date": day, "type": 2}
proxies = {
"http_proxy": "http://username:[email protected]:8080",
"https_proxy": "https://username:[email protected]:8080"
}
res = requests.get(url, params=params, proxies=proxies)
json_data = res.json()
print(day)
for num in range(len(json_data["results"])):
ordinance_code = json_data["results"][num]["ordinanceCode"]
form_code = json_data["results"][num]["formCode"]
if ordinance_code == "010" and form_code == "030000":
print(json_data["results"][num]["filerName"], json_data["results"][num]["docDescription"],
json_data["results"][num]["docID"])
securities_report_doc_list.append(json_data["results"][num]["docID"])
return securities_report_doc_list
def download_xbrl_in_zip(securities_report_doc_list, number_of_lists):
for index, doc_id in enumerate(securities_report_doc_list):
print(doc_id, ":", index + 1, "/", number_of_lists)
url = "https://disclosure.edinet-fsa.go.jp/api/v1/documents/" + doc_id
params = {"type": 1}
filename = "C://Users//xxx//Desktop//xbrlReport//SR//" + doc_id + ".zip"
res = requests.get(url, params=params, stream=True)
if res.status_code == 200:
with open(filename, 'wb') as file:
for chunk in res.iter_content(chunk_size=1024):
file.write(chunk)
def main():
start_date = datetime.date(2019, 11, 1)
end_date = datetime.date(2019, 11, 30)
day_list = make_day_list(start_date, end_date)
securities_report_doc_list = make_doc_id_list(day_list)
number_of_lists = len(securities_report_doc_list)
print("number_of_lists:", len(securities_report_doc_list))
print("get_list:", securities_report_doc_list)
download_xbrl_in_zip(securities_report_doc_list, number_of_lists)
print("download finish")
if __name__ == "__main__":
main()
This time, we targeted securities reports, but by changing ʻordinanceCode and
form_code, you can automatically collect report data written in other XBRL. For example, in the case of a quarterly report, you can change the system so that only the submitted documents with ʻordinanceCode
of 010 and form_code
of 043000 are processed. The following is a brief summary including corrections.
〇 Securities report
if ordinanceCode == "010" and formCode =="030000" :
〇Corrected securities report
if ordinanceCode == "010" and formCode =="030001" :
〇 Quarterly report
if ordinanceCode == "010" and formCode =="043000" :
〇Corrected quarterly report
if ordinanceCode == "010" and formCode =="043001" :
For the prefectural ordinance code and form code of all forms, see EDINET API-related materials (released on March 17, 2019). Please refer to Attachment 1_Form Code List.xlsx
included in the zip file downloaded from [API Specification].
For inquiries regarding this article, please contact the following e-mail address. e-mail:[email protected] (Of course, comments on qiita are also welcome)
This e-mail address will be the contact point for inquiries about the Development Committee of XBRL Japan, which writes the article for qiita. I will. Therefore, we cannot answer general inquiries about the organization depending on the content, but please feel free to contact us with any technical questions, opinions, requests, advice, etc. regarding XBRL. Please note that it may take some time to respond because the committee members are volunteers.
Recommended Posts