Download the past question PDF of Fundamental Information Technology Engineer Examination (FE) using Python's urllib package.
The past questions of the Fundamental Information Technology Engineer Examination are [published] on the official IPA website (https://www.jitec.ipa.go.jp/1_04hanni_sukiru/_index_mondai.html). However, questions and answers are posted for each year, and you must go to the page for each year and download them. To save this hassle, download the questions and answers at once using Python's urllib package.
Looking at the pages posted in the past questions, for example, the URL of the 2015 Spring Examination is as follows.
--Morning exam --Problem: https://www.jitec.ipa.go.jp/1_04hanni_sukiru/mondai_kaitou_2015h27_1/2015h27h_fe_am_qs.pdf --Answer: https://www.jitec.ipa.go.jp/1_04hanni_sukiru/mondai_kaitou_2015h27_1/2015h27h_fe_am_ans.pdf --Afternoon exam --Problem: https://www.jitec.ipa.go.jp/1_04hanni_sukiru/mondai_kaitou_2015h27_1/2015h27h_fe_pm_qs.pdf --Answer: https://www.jitec.ipa.go.jp/1_04hanni_sukiru/mondai_kaitou_2015h27_1/2015h27h_fe_pm_ans.pdf --Comment: https://www.jitec.ipa.go.jp/1_04hanni_sukiru/mondai_kaitou_2015h27_1/2015h27h_fe_pm_cmnt.pdf
The structure of the URL of the past question is
https://www.jitec.ipa.go.jp/1_04hanni_sukiru/mondai_kaitou_
In addition to
[Western calendar] [Japanese calendar] _ [1 OR 2] / [Western calendar] [Japanese calendar] [h OR a] _fe_ [am OR pm] _ [qs OR ans OR cmnt] .pdf
It can be seen that
I wrote the code using a lot of for statements without thinking too much.
kakomon.py
import urllib.request
def download():
#Common (first half) part of URL
urlbase = "https://www.jitec.ipa.go.jp/1_04hanni_sukiru/mondai_kaitou_"
#Spring and autumn
season = {1:"h", 2:"a"}
# 2009-Download the 2019 PDF (questions / answers / comments)
for y in range(2009,2020):
nendo = str(y) + "h" + str(y - 1988) #Example: 2009h21
for s in range(1,3):
for t in ["am","pm"]:
if t == "pm": #Commentary only in the afternoon
try:
url = urlbase + nendo + "_" + str(s) + "/" + nendo + season[s] + "_fe_" + t + "_cmnt.pdf"
filename = nendo + season[s] + "_fe_" + t + "_cmnt.pdf"
urllib.request.urlretrieve(url,"{0}".format(filename))
except urllib.error.HTTPError:
print("Error: " + filename) #Show file names that could not be downloaded
for qa in ["qs","ans"]:
try:
url = urlbase + nendo + "_" + str(s) + "/" + nendo + season[s] + "_fe_" + t + "_" + qa + ".pdf"
filename = nendo + season[s] + "_fe_" + t + "_" + qa + ".pdf"
urllib.request.urlretrieve(url,"{0}".format(filename))
except urllib.error.HTTPError:
print("Error: " + filename) #Show file names that could not be downloaded
if __name__ == "__main__":
download()
When the above is executed, the PDF file is obtained and the following error message is displayed (as of December 30, 2019).
Error: 2011h23h_fe_am_qs.pdf
Error: 2011h23h_fe_am_ans.pdf
Error: 2011h23h_fe_pm_cmnt.pdf
Error: 2011h23h_fe_pm_qs.pdf
Error: 2011h23h_fe_pm_ans.pdf
Error: 2019h31a_fe_am_qs.pdf
Error: 2019h31a_fe_am_ans.pdf
Error: 2019h31a_fe_pm_cmnt.pdf
Error: 2019h31a_fe_pm_qs.pdf
Error: 2019h31a_fe_pm_ans.pdf
This is due to the following two facts.
--Due to the Great East Japan Earthquake, there was no spring test in 2011, and a ** special test ** was held instead. ――Because of the revision, the ** Autumn test was conducted in the fall of 2019.
It is necessary to manually download or rewrite the above program to obtain past questions of the year and time when an error occurs, that is, an exception to the program. For example, the author used the following program.
kakomon_revised.py
import urllib.request
def download():
#Common (first half) part of the 2011 special exam and 2019 fall exam URL and common part of the file name
urlbase = {"https://www.jitec.ipa.go.jp/1_04hanni_sukiru/mondai_kaitou_2011h23_1/2011h23tokubetsu_fe_":"2011h23tokubetsu_fe_",
"https://www.jitec.ipa.go.jp/1_04hanni_sukiru/mondai_kaitou_2019h31_2/2019r01a_fe_":"2019r01a_fe_"}
#Download PDF (questions, answers, comments) for the 2011 Special Exam and the 2019 Fall Exam
for u in urlbase:
for t in ["am","pm"]:
if t == "pm": #Commentary only in the afternoon
try:
url = u + t + "_cmnt.pdf"
filename = urlbase[u] + t + "_cmnt.pdf"
urllib.request.urlretrieve(url,"{0}".format(filename))
except urllib.error.HTTPError:
print("Error: " + filename) #Show file names that could not be downloaded
for qa in ["qs","ans"]:
try:
url = u + t + "_" + qa + ".pdf"
filename = urlbase[u] + t + "_" + qa + ".pdf"
urllib.request.urlretrieve(url,"{0}".format(filename))
except urllib.error.HTTPError:
print("Error: " + filename) #Show file names that could not be downloaded
if __name__ == "__main__":
download()
There is a change in the afternoon exam from the 2nd year of Reiwa, and it seems that the programming language is COBOL will be abolished and Python will be added. The number of questions, the number of answers, the points assigned, etc. will also change.
--Download files on the Web with Python --Qiita
Recommended Posts