Whenever an engineer job seeker visits our website, he will stop by, [Job matter page] (https://ritsuan.com/job/) The update frequency here is very low, and even old projects are covered with dust and are not maintained. It's no good ... I have to do something quickly ...
While thinking about that, I found something like this when I was wandering around the internet for a while.
[Visualization of data by prefecture] (https://qiita.com/SaitoTsutomu/items/6d17889ba47357e44131)
Oh, I was inspired.
** Let's visualize the number of projects by prefecture on the job project page and encourage updates! ** </ font> It seems that the scope of business has expanded recently, and the number of projects around major cities all over Japan will be large! (Pretending to be in front)
Here is the one that was completed by trial and error.
Python3
import requests
from bs4 import BeautifulSoup
from japanmap import pref_names, picture
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
plis=[]
for y in range(47):
plis.append(0)
for z in range(50):
r = requests.get('https://ritsuan.com/job/page/{}'.format(z+1))
bs = BeautifulSoup(r.text, 'html.parser')
for i in range(47):
pnlist = pref_names[i+1]
for j in bs.select("div[class=main_container] h2"):
pc = j.text
if pnlist in pc:
plis[i] += 1
dic={}
for k in range(47):
pdic = pref_names[k+1]
dic[pdic] = plis[k]
cmap = plt.get_cmap("Reds")
df = pd.DataFrame.from_dict(dic, orient="index", columns=["Number of cases"])
norm = plt.Normalize(vmin=df.min(), vmax=df.max())
fcol = lambda x: '#' + bytes(cmap(norm(x), bytes=True)[:3]).hex()
plt.rcParams['figure.figsize'] = 20, 20
plt.colorbar(plt.cm.ScalarMappable(norm, cmap))
plt.imshow(picture(df.Number of cases.apply(fcol)))
This happens. <img width=400" src="https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/472694/355a928c-4c83-cbea-4acb-0f6d7be07b3e.png "> What is this? Only the Kanto-Tokai area is extremely dark. I expected this to happen as long as the head office is in Shizuoka prefecture, but I think the scope of business has expanded further. Well, because some areas are prominent, other areas are buried and look thin.
Confirmation immediately. Let's list it.
Python3
df = pd.DataFrame.from_dict(dic, orient="index", columns=["Number of cases"])
df.to_excel("job_list.xlsx", sheet_name="job_list")
Prefectures | Number of cases | Prefectures | Number of cases | Prefectures | Number of cases |
---|---|---|---|---|---|
Hokkaido | 0 | Ishikawa Prefecture | 0 | Okayama Prefecture | 0 |
Aomori Prefecture | 1 | Fukui prefecture | 0 | Hiroshima Prefecture | 2 |
Iwate Prefecture | 0 | Yamanashi Prefecture | 2 | Yamaguchi Prefecture | 0 |
Miyagi Prefecture | 1 | Nagano Prefecture | 0 | Tokushima Prefecture | 0 |
Akita | 0 | Gifu Prefecture | 3 | Kagawa Prefecture | 0 |
Yamagata Prefecture | 0 | Shizuoka Prefecture | 62 | Ehime Prefecture | 0 |
Fukushima Prefecture | 0 | Aichi prefecture | 90 | Kochi Prefecture | 0 |
Ibaraki Prefecture | 12 | Mie Prefecture | 7 | Fukuoka Prefecture | 2 |
Tochigi Prefecture | 1 | Shiga Prefecture | 4 | Saga Prefecture | 0 |
Gunma Prefecture | 7 | Kyoto | 0 | Nagasaki Prefecture | 0 |
Saitama | 9 | Osaka | 3 | Kumamoto Prefecture | 0 |
Chiba | 2 | Hyogo prefecture | 1 | Oita Prefecture | 1 |
Tokyo | 133 | Nara Prefecture | 0 | Miyazaki prefecture | 0 |
Kanagawa Prefecture | 60 | Wakayama Prefecture | 0 | Kagoshima prefecture | 0 |
Niigata Prefecture | 0 | Tottori prefecture | 0 | Okinawa Prefecture | 2 |
Toyama Prefecture | 0 | Shimane Prefecture | 0 |
oh... There are areas where the number of projects is in the single digits, but does this really exist? If a job seeker in the area sees it, that line is likely to be heard. "Wow ... there are too few projects in my area ...?"
With this, even if it is visualized with a choropleth diagram, it is almost not displayed.
By the way, it was supposed that the number of projects would be large in major cities all over Japan, but it was surprising that the result was significantly overturned. (Recovery) I heard that there is a business office in Kansai, but it seems that it has not been updated. Now that we have the evidence (?), We can fuel the update.
This time, I am searching for the part of [○○ prefecture ~] in the job case list, but there is a pattern in which the company name and multiple area names are described, and it is not counted as the number of searches. Such an inconsistent description method would not be accepted by the public. ~~ I can't search, so ~~ I should fix it immediately. In addition, I tried to search in the "address" column, but I did not do it because it seemed to take time from the description content of html.
There are many types of matplotlib colormaps, and I have tried all of them this time, so I will post them here. It's a hobby area, but please take a leisurely look.
viridis plasma
inferno
magma
cividis
Greys
Purples
Blues
Greens
Oranges
Reds
YlOrBr
YlOrRd
OrRd
PuRd
RdPu
BuPu
GnBu
PuBu
YlGnBu
PuBuGn
BuGn
YlGn
binary
gist_yarg
gist_gray
gray
bone
pink
spring
summer
autumn
winter
cool
Wistia
hot
afmhot
gist_heat
copper
PiYG
PrGn
BrBG
PuOr
RdGy
RdBu
RdYlBu
RdYlGn
Spectral
coolwarm
bwr
seismic
twilight
twilight_shifted
hsv
Pastel1
Pastel2
Paired
Accent
Dark2
Set1
Set2
Set3
tab10
tab20
tab20b
tab20c
flag
prism
ocean
gist_earth
terrain
gist_stern
gnuplot
gnuplot2
CMRmap
cubehelix
brg
gist_rainbow
rainbow
jet
nipy_spectral
gist_ncar
You who scrolled so far seem to be a lover. This kind of playfulness is sometimes important for engineers.