Indeed.com has an item that shows the salary level and the number of cases. Use this item to calculate the average salary.
avgsalary.py
import urllib.request, urllib.parse
from bs4 import BeautifulSoup
import re, getopt, sys
import numpy as np
def avgsalary(query, location):
query = urllib.parse.quote_plus(query)
location = urllib.parse.quote_plus(location)
url = "https://jp.indeed.com/%E6%B1%82%E4%BA%BA?q={}&l={}&radius=0".format(query, location)
request = urllib.request.urlopen(url);
soup = BeautifulSoup(request.read(), 'html.parser')
result = soup.find(id="SALARY_rbo")
results = result.find_all("li")
salaries = []
num_salaries = []
for result in results:
tmp = result.a["title"]
tmp = re.sub(',','', tmp)
tmp = re.sub(r'([0-9]+)[^\d]+([0-9]+).*$', r'\1,\2', tmp);
tmp = tmp.split(",")
salaries.append(tmp[0])
num_salaries.append(tmp[1])
salaries = np.array(salaries).astype(np.float)
salaries *= 10000
num_salaries = np.array(num_salaries).astype(np.float)
return(np.sum(salaries * num_salaries)/np.sum(num_salaries))
def main():
try:
opts, args = getopt.getopt(sys.argv[1:],"q:l:", ["query", "location"]);
except getout.GetoptError as err:
#usage()
sys.exit(2)
query = ""
location = ""
for o, a in opts:
if o == "-q":
query = a
elif o == "-l":
location = a
print(avgsalary(query, location))
if __name__ == "__main__":
main()
$ python avgsalary.py -l Gotemba
2312722.94887
This code does the following:
For example, the following comparison is interesting.
$ python avgsalary.py -q programmer
4469298.24561
$ python avgsalary.py -q programmer
3116876.47306
This comparison generally means the difference in annual income between "English jobs" and "Japanese jobs". Considering that English job vacancies have a higher annual income of more than 1 million, we can see how important English is. By the way, if you use the US version of indeed.com, you can see that the average salary of American programmers is over 7 million yen.
Recommended Posts