I wanted to output statistics on the number of job vacancies by region and job vacancies in job change activities, so I created a script to use for the statistics.
Indeed, send a query and region to .com, extract the number of search results from the received results, and display them. Use urllib, re and bs4.
jobcounter.py
import urllib.request, urllib.parse
from bs4 import BeautifulSoup
import re, getopt, sys
def jobcounter(query, location):
query = urllib.parse.quote_plus(query)
location = urllib.parse.quote_plus(location)
url = "https://jp.indeed.com/%E6%B1%82%E4%BA%BA?q={}&l={}&radius=0".format(query, location)
request = urllib.request.urlopen(url);
soup = BeautifulSoup(request.read(), 'html.parser')
result = soup.find_all(id="searchCount")[0].get_text()
result = result.replace(",", "");
result = re.sub(r'Job search results([0-9]+) .*$', r'\1', result);
return(result)
def main():
try:
opts, args = getopt.getopt(sys.argv[1:],"q:l:", ["query", "location"]);
except getout.GetoptError as err:
#usage()
sys.exit(2)
query = ""
location = ""
for o, a in opts:
if o == "-q":
query = a
elif o == "-l":
location = a
print(jobcounter(query, location))
if __name__ == "__main__":
main()
Execute the following command.
$ python jobcounter.py -q programmer-l Shibuya
The execution result is as follows.
result.
1740
This result means that "1740" were found as a result of searching for jobs including "programmer" in the area "Shibuya".
The main uses are to obtain statistics such as "how many jobs are available for each occupation in a specific area" and "how many jobs are available for each occupation in a specific area". Can be used.
jobcounter(query, location)
I made an easy-to-understand function, so all you have to do is pass the query and region in a loop with an array or yaml. The return value is the number of cases.
urllib and re should be included originally, but bs4 needs to be included with pip.
# pip install bs4
Also, if you change the appearance, wording or html on indeed.com, it may get stuck. Specifically, there is an html element defined by the id "searchCount", but if this id name is changed, it cannot be obtained. Alternatively, since the text in searchCount is formatted with re, it will not be formatted properly if the text does not match the regular expression.
Web scraping is the extraction of information from a website, and this script is also a type of web scraping. There is a UNIX philosophy of "doing one thing well", and the above script is generally based on this idea.
It doesn't have a spectacular feature, but it's good enough as a function to get job vacancies statistics. The script itself is not complicated and anyone can understand it.
Recommended Posts