I wanted to output statistics on the number of job vacancies by region and job vacancies in job change activities, so I created a script to use for the statistics.

Overview

Indeed, send a query and region to .com, extract the number of search results from the received results, and display them. Use urllib, re and bs4.

code

`jobcounter.py`


import urllib.request, urllib.parse
from bs4 import BeautifulSoup
import re, getopt, sys

def jobcounter(query, location):
    query = urllib.parse.quote_plus(query)
    location = urllib.parse.quote_plus(location)
    url = "https://jp.indeed.com/%E6%B1%82%E4%BA%BA?q={}&l={}&radius=0".format(query, location)
            
    request = urllib.request.urlopen(url);
    soup = BeautifulSoup(request.read(), 'html.parser')
    result = soup.find_all(id="searchCount")[0].get_text()
    result = result.replace(",", "");
    result = re.sub(r'Job search results([0-9]+) .*$', r'\1', result);
    return(result)

def main():

    try:  
        opts, args = getopt.getopt(sys.argv[1:],"q:l:", ["query", "location"]);
    except getout.GetoptError as err:
        #usage()
        sys.exit(2)

    query = ""
    location = ""
    for o, a in opts:
        if o == "-q":
            query = a
        elif o == "-l":
            location = a

    print(jobcounter(query, location))

if __name__ == "__main__":
    main()

Try from CLI

Execute the following command.

$ python jobcounter.py -q programmer-l Shibuya

The execution result is as follows.

`result.`

This result means that "1740" were found as a result of searching for jobs including "programmer" in the area "Shibuya".

How to use jobcounter

The main uses are to obtain statistics such as "how many jobs are available for each occupation in a specific area" and "how many jobs are available for each occupation in a specific area". Can be used.

jobcounter(query, location)

I made an easy-to-understand function, so all you have to do is pass the query and region in a loop with an array or yaml. The return value is the number of cases.

important point

urllib and re should be included originally, but bs4 needs to be included with pip.

# pip install bs4

Also, if you change the appearance, wording or html on indeed.com, it may get stuck. Specifically, there is an html element defined by the id "searchCount", but if this id name is changed, it cannot be obtained. Alternatively, since the text in searchCount is formatted with re, it will not be formatted properly if the text does not match the regular expression.

Web scraping and Unix philosophy

Web scraping is the extraction of information from a website, and this script is also a type of web scraping. There is a UNIX philosophy of "doing one thing well", and the above script is generally based on this idea.

It doesn't have a spectacular feature, but it's good enough as a function to get job vacancies statistics. The script itself is not complicated and anyone can understand it.

A python script that gets the number of jobs for a specified condition from indeed.com