Overview

Download the past question PDF of Fundamental Information Technology Engineer Examination (FE) using Python's urllib package.

Details

The past questions of the Fundamental Information Technology Engineer Examination are [published] on the official IPA website (https://www.jitec.ipa.go.jp/1_04hanni_sukiru/_index_mondai.html). However, questions and answers are posted for each year, and you must go to the page for each year and download them. To save this hassle, download the questions and answers at once using Python's urllib package.

Look up the URL

Looking at the pages posted in the past questions, for example, the URL of the 2015 Spring Examination is as follows.

--Morning exam --Problem: https://www.jitec.ipa.go.jp/1_04hanni_sukiru/mondai_kaitou_2015h27_1/2015h27h_fe_am_qs.pdf --Answer: https://www.jitec.ipa.go.jp/1_04hanni_sukiru/mondai_kaitou_2015h27_1/2015h27h_fe_am_ans.pdf --Afternoon exam --Problem: https://www.jitec.ipa.go.jp/1_04hanni_sukiru/mondai_kaitou_2015h27_1/2015h27h_fe_pm_qs.pdf --Answer: https://www.jitec.ipa.go.jp/1_04hanni_sukiru/mondai_kaitou_2015h27_1/2015h27h_fe_pm_ans.pdf --Comment: https://www.jitec.ipa.go.jp/1_04hanni_sukiru/mondai_kaitou_2015h27_1/2015h27h_fe_pm_cmnt.pdf

The structure of the URL of the past question is https://www.jitec.ipa.go.jp/1_04hanni_sukiru/mondai_kaitou_ In addition to [Western calendar] [Japanese calendar] _ [1 OR 2] / [Western calendar] [Japanese calendar] [h OR a] _fe_ [am OR pm] _ [qs OR ans OR cmnt] .pdf It can be seen that

Implementation

I wrote the code using a lot of for statements without thinking too much.

`kakomon.py`


import urllib.request

def download():
    #Common (first half) part of URL
    urlbase = "https://www.jitec.ipa.go.jp/1_04hanni_sukiru/mondai_kaitou_"
    
    #Spring and autumn
    season = {1:"h", 2:"a"}
    
    # 2009-Download the 2019 PDF (questions / answers / comments)
    for y in range(2009,2020):
        nendo = str(y) + "h" + str(y - 1988)  #Example: 2009h21
        for s in range(1,3):
            for t in ["am","pm"]:
                if t == "pm":   #Commentary only in the afternoon
                    try:
                        url = urlbase + nendo + "_" + str(s) + "/" + nendo + season[s] + "_fe_" + t + "_cmnt.pdf"
                        filename = nendo + season[s] + "_fe_" + t + "_cmnt.pdf"
                        urllib.request.urlretrieve(url,"{0}".format(filename))
                    except urllib.error.HTTPError:
                        print("Error: " + filename) #Show file names that could not be downloaded
                for qa in ["qs","ans"]:
                    try:
                        url = urlbase + nendo + "_" + str(s) + "/" + nendo + season[s] + "_fe_" + t + "_" + qa + ".pdf"
                        filename = nendo + season[s] + "_fe_" + t + "_" + qa + ".pdf"
                        urllib.request.urlretrieve(url,"{0}".format(filename))
                    except urllib.error.HTTPError:
                        print("Error: " + filename) #Show file names that could not be downloaded

if __name__ == "__main__":
    download()

When the above is executed, the PDF file is obtained and the following error message is displayed (as of December 30, 2019).

Error: 2011h23h_fe_am_qs.pdf
Error: 2011h23h_fe_am_ans.pdf
Error: 2011h23h_fe_pm_cmnt.pdf
Error: 2011h23h_fe_pm_qs.pdf
Error: 2011h23h_fe_pm_ans.pdf
Error: 2019h31a_fe_am_qs.pdf
Error: 2019h31a_fe_am_ans.pdf
Error: 2019h31a_fe_pm_cmnt.pdf
Error: 2019h31a_fe_pm_qs.pdf
Error: 2019h31a_fe_pm_ans.pdf

This is due to the following two facts.

--Due to the Great East Japan Earthquake, there was no spring test in 2011, and a ** special test ** was held instead. ――Because of the revision, the ** Autumn test was conducted in the fall of 2019.

It is necessary to manually download or rewrite the above program to obtain past questions of the year and time when an error occurs, that is, an exception to the program. For example, the author used the following program.

`kakomon_revised.py`


import urllib.request

def download():
    #Common (first half) part of the 2011 special exam and 2019 fall exam URL and common part of the file name
    urlbase = {"https://www.jitec.ipa.go.jp/1_04hanni_sukiru/mondai_kaitou_2011h23_1/2011h23tokubetsu_fe_":"2011h23tokubetsu_fe_",
               "https://www.jitec.ipa.go.jp/1_04hanni_sukiru/mondai_kaitou_2019h31_2/2019r01a_fe_":"2019r01a_fe_"}

    #Download PDF (questions, answers, comments) for the 2011 Special Exam and the 2019 Fall Exam
    for u in urlbase:
        for t in ["am","pm"]:
            if t == "pm":   #Commentary only in the afternoon
                try:
                    url = u + t + "_cmnt.pdf"
                    filename = urlbase[u] + t + "_cmnt.pdf"
                    urllib.request.urlretrieve(url,"{0}".format(filename))
                except urllib.error.HTTPError:
                    print("Error: " + filename) #Show file names that could not be downloaded
            for qa in ["qs","ans"]:
                try:
                    url = u + t + "_" + qa + ".pdf"
                    filename = urlbase[u] + t + "_" + qa + ".pdf"
                    urllib.request.urlretrieve(url,"{0}".format(filename))
                except urllib.error.HTTPError:
                    print("Error: " + filename) #Show file names that could not be downloaded

if __name__ == "__main__":
    download()

Digression

There is a change in the afternoon exam from the 2nd year of Reiwa, and it seems that the programming language is COBOL will be abolished and Python will be added. The number of questions, the number of answers, the points assigned, etc. will also change.

reference

--Download files on the Web with Python --Qiita

A story about downloading the past question PDF of the Fundamental Information Technology Engineer Examination in Python at once