There are many songs by Masashi Sada (Massan) that cover specific areas and places.
All are wonderful songs (^ o ^) In particular, Tobiume is a super-masterpiece that is regenerated in the brain every time you go to Dazaifu Tenmangu Shrine.
Which region is often featured in Massan's songs including these? After all Nagasaki? Tokyo? Or ...? In order to solve the mystery of, I picked up the keyboard and wrote the code.
By the way, "From the North Country" is a very famous place, but as it has been featured many times in the Advent calendar so far, there is no lyrics, so this time it is out of the scope of the survey.
I scraped the lyrics site. Due to copyright issues, we will not list them here.
I used Python's morphological analysis library janome for part-speech decomposition.
tokenize_sample.py
#!/usr/bin/env python
# -* encoding: utf-8 -*
from janome.tokenizer import Tokenizer
def main():
for token in Tokenizer().tokenize('Three red bridges over Shinji Pond'):
print(token)
if __name__ == '__main__':
main()
If you write and execute code like the above ...
Shinjiike noun,General,*,*,*,*,Shinji Pond,Shinjiike,Shinjiike
Particles,Case particles,General,*,*,*,To,D,D
Such a verb,Independence,*,*,Five steps, La line,Uninflected word,Take,Kakar,Kakar
Three nouns,General,*,*,*,*,three,Mitz,Mitz
Particles,Case particles,General,*,*,*,of,No,No
Red adjective,Independence,*,*,Adjective, Auoudan,Uninflected word,red,Akai,Akai
Bridge noun,General,*,*,*,*,bridge,Hashi,Hashi
Oh! Shinji Pond is properly recognized! !! And the tension goes up.
For the detailed mechanism and history of janome, please refer to the middle slide "Pyconjp2015 --Morphological analysis made with Python". Please give me. (You were born in 2015! Thank you!)
The lyrics that have been scraped in advance are analyzed by janome and the words that correspond to "region" are picked up. Also, since I want to reflect the number of appearances of words in the density of the heat map, I also counted the words at the same time.
def count_place():
place_count_dict = {}
with open('sada_lyrics.csv','r') as lyrics:
reader = csv.reader(lyrics)
for row in reader:
t = Tokenizer(udic='sada_dict.csv', udic_enc='utf8')
for token in t.tokenize(row[1]):
if 'noun,固有noun,area' in token.part_of_speech:
place_name = token.surface
if place_name in place_count_dict:
place_count_dict[place_name] = place_count_dict[place_name]+1
else:
place_count_dict[place_name] = 1
return place_count_dict
You can also read custom dictionaries. According to the janome documentation, the dictionary format is the same as Mecab.
sada_dict.csv
Yushima Cathedral,1288,1288,5000,noun,固有noun,General,*,*,*,Yushima Cathedral,Yushima Seidou,Yushima Seidou
Sky tree,1288,1288,5001,noun,固有noun,General,*,*,*,Sky tree,Sky tree,Sky tree
An example of the janome document is "Tokyo Sky Tree", but since it is sung as "Sky Tree" in "Kasutira" that everyone knows, it is recommended to register with "Sky Tree".
Actually, I wanted to reflect the result of identifying the location of the proper noun in the heat map, but I gave up because of time. So I don't even use a dictionary. I'm sorry to mention it.
For geocoding, I also used Python's googlemaps library.
Since you will be using the Google Maps API, you need to specify the API key. How to get it is described in googlemaps GitHub.
import googlemaps
def geocode(place_name):
gmaps = googlemaps.Client(key='write your API key')
geocode_result = gmaps.geocode(place_name)
coord = geocode_result[0]['geometry']['viewport']['northeast']
return coord['lat'], coord['lng']
The following is a code that connects the work up to this point (lyrics are decomposed into part of speech-extracting area names-geocoding).
sada_place_geocoder.py
#!/usr/bin/env python
# -* encoding: utf-8 -*
from janome.tokenizer import Tokenizer
import csv
import googlemaps
def main():
writer = csv.writer(open('sada_places.csv','w'), delimiter=',')
place_count_dict = count_place()
gmaps = googlemaps.Client(key='write your API key')
for place_name, place_count in place_count_dict.items():
lat, lon = geocode(gmaps, place_name)
writer.writerow([place_name, place_count, lat, lon])
def count_place():
place_count_dict = {}
with open('sada_lyrics.csv','r') as lyrics:
reader = csv.reader(lyrics)
for row in reader:
t = Tokenizer()
for token in t.tokenize(row[1]):
if 'noun,固有noun,area' in token.part_of_speech:
place_name = token.surface
if place_name in place_count_dict:
place_count_dict[place_name] = place_count_dict[place_name]+1
else:
place_count_dict[place_name] = 1
return place_count_dict
def geocode(gmaps, place_name):
geocode_result = gmaps.geocode(place_name)
coord = geocode_result[0]['geometry']['viewport']['northeast']
return coord['lat'], coord['lng']
if __name__ == '__main__':
main()
As a result, the area name, the number of appearances in the lyrics, latitude, and longitude are output. Since I don't use a custom dictionary, I can see garbage records here and there, but this time I will ignore it.
sada_places.csv
America,1,49.38,-66.94
Bermuda,1,14.5192371802915,121.0361231302915
Akita,1,39.86527460000001,140.5154199
Kasugayama,1,37.1489639802915,138.2363259802915
Victoria,1,48.450518,-123.322346
Mimiya,1,36.4073904302915,136.4570957
Minase,1,34.8791869802915,135.6691649802915
Kyo,3,30.5403905,120.3877692
spring,1,33.8689809,130.8083576
Asuka,3,38.8972965,139.9375578
Day,2,50.68819,5.675110099999999
Kamakura,2,35.3682478,139.5933376
Bathhouse,1,34.93531738029149,135.7610285302915
Yamami,1,36.5698502,136.9701007
Jerusalem,1,31.8829601,35.2652869
West Kyo,1,34.67190798029149,135.7844679802915
Addition,1,36.5431863,-6.255334599999999
Berlin,1,52.6754542,13.7611176
Kiraku,1,35.1904253,136.7319704
Mitsuke,3,37.5933274,139.0009869
Nagasaki,5,35.7377658,139.6976565
Urashima,1,35.4839466,139.6447166
Atago,1,35.9737504,139.6042941
Happy,1,34.4654479,135.5854033
Akishino,1,34.7155978,135.7837222
Heiankyo,1,44.5883529,127.1930004
Hong Kong,1,14.4904672802915,121.0242180302915
Karuizawa,2,36.4240846,138.6571307
Han,2,32.555258,114.2922103
Inasa,1,32.7592694,129.8647033
Kyoto,1,35.0542,135.8236
Musashi Koganei,1,35.70241118029149,139.5080892802915
Hakuhagi,1,38.2529733,140.9109412
Chino,3,34.047811,-117.5995851
Under the slope,1,35.3120498,139.5356368
Yukon,1,69.646498,-123.8009179
Home,1,34.0886418,132.9547384
Nanjing,1,32.3940135,119.050169
France,3,51.0891658,9.5597934
Mimomi,1,35.6882069802915,140.0695889802915
Welcome,1,37.9205189,112.7839926
Sound money,1,37.2102144,139.9250478
Yabu,1,35.3875492,140.1588221
Kano,1,35.2016331,135.4969237
Ebina,1,35.4774536,139.4364727
Renge,5,48.02912,8.027220699999999
Magellan,1,31.8199301,76.95342
Michinoku,1,35.5030142302915,139.6870448302915
Pearl Harbor,1,21.3885713,-157.9335744
Wharf,1,34.6863148,135.1933421
Harumi,1,35.6634906,139.7897775
Japan,2,34.6687571,135.5100311
Sophia,2,42.7877752,23.4569049
Alaska,4,71.3868712,-129.9945562
Yangtze River,1,36.4361024802915,139.8532846302915
Kitamae,1,26.3027021,127.7615069
Tsugaru,1,35.0117177302915,135.7573022302915
Ginza,1,35.6760255,139.7724941
Dew,1,51.2964846,22.6735312
United States,1,49.38,-66.94
Casablanca,2,33.6486015,-7.4582757
Tokyo,31,35.817813,139.910202
Pharmacy,1,35.0155830302915,135.7545184802915
Far north,1,12.9797045,15.683687
France orchid west,1,35.17525588029149,139.6558066802915
Gojo,1,39.5593820302915,115.7611693
Baghdad,2,33.4350586,44.5558261
Kanzeonji Temple,1,33.5222913,130.5254343
Akasaka,1,35.6782744,139.7459391
Buddha,4,34.9489952,136.9632495
New York,1,40.91525559999999,-73.70027209999999
Nishiki,1,32.2516958,130.9134777
Tigris,1,-15.4044999,-42.8735213
Hisakata,1,35.10910000000001,136.9854947
Rippling,1,33.9145777,130.8043569
Yushima,2,35.711327,139.7724702
Narayama,1,34.71184798029149,135.8116589802915
Koshien,1,34.7234607,135.3633836
Shinjuku,2,35.7298963,139.7451654
Kasumi,1,24.0234098,82.02101979999999
Fuji,1,35.3539032,138.8118555
curry,1,50.9818821,1.9320691
Nagasaki,24,32.9686469,129.9938174
Minamiyamate,2,32.7361422,129.8708733
Kutchan,1,43.015163,140.9243102
Alley,1,36.1243706,139.5655411
Sakamoto,1,37.9298369802915,140.9141139802915
Shijo,1,35.0044451802915,135.7580809302915
Mediterranean,1,45.7927967,36.215244
Akebono,1,26.2435843,127.6904124
Kagura,1,34.6626033,135.1513682
Azumi,2,36.3649943,137.8106765
Hiroshima,1,31.9163645,131.4305945
Mt. Emei,1,29.7169085,103.6231299
Yokohama,1,35.5113,139.674
Ueno,1,36.1325774,138.8291853
Chile,6,-17.4983293,-66.4169643
Yoga,1,35.62797998029149,139.6354899802915
Kilimanjaro,2,-3.0562826,37.3716347
Lifting feathers,1,35.1848028,136.9673238
Hiroshima,6,34.4426,132.4865
Nairobi,1,-1.164744,37.0493746
Tanifu,1,35.5452879,136.6135764
Nerima,2,35.779946,139.6811359
Namba,1,43.648665,-116.48121
Asakusa,1,35.7233639,139.8055923
Oshiage,2,35.71155898029149,139.8137769802915
Shinsaibashi,2,30.6801709802915,114.2062109802915
Japan,9,45.5227719,145.8175503
Tokyo,1,36.0447089,139.3743599
Rokuto,1,36.0028345,140.1105419
Tree root,2,36.9430004,137.4747414
Tateyama,1,36.5847934,137.6343407
Arakawa,1,36.1415564,139.8589857
Germany,5,41.2296285,141.0143767
Kimikage,1,34.7205324,135.1428907
Nara,3,34.70489999999999,135.8384
Shanghai,2,31.6688967,122.1137989
Yunnan,4,29.2233272,106.1977228
Yue,1,36.7995957,138.4063989
Gion,1,34.4529231,132.4693298
Shinano,2,36.8707572,138.2803909
Higashiyama,1,35.010837,135.7914226
Yotsuya,2,35.6726745,139.4551008
Nagano,1,36.835842,138.3190722
Planting,1,33.6152803,130.5166492
Since the coordinates and the number of appearances for each region are recorded in sada_places.csv generated earlier, use this information to reflect it in the heat map.
I used the Google Maps API for geocoding, so I also tried using the Google Maps API for heatmaps.
I have written both the style and the script in HTML, but the amount of code is like this.
<!DOCTYPE html>
<html>
<head>
<style>
#map {
width: 1200px;
height: 600px;
}
</style>
<script
src="https://maps.googleapis.com/maps/api/js?key='write your API key'&libraries=geometry,visualization">
</script>
<script>
function initialize() {
var mapCanvas = document.getElementById('map');
var mapOptions = {
center: new google.maps.LatLng(36.83566824724438,138.372802734375),
zoom: 6,
mapTypeId: google.maps.MapTypeId.ROADMAP
}
var map = new google.maps.Map(mapCanvas, mapOptions)
var heatmapData = [
//Objects are lined up as many as the number of coordinates. Omitted because it is long.
//It is definitely better to be able to create an external file
{ weight : 2 , location : new google.maps.LatLng(32.7361422,129.8708733) },
{ weight : 4 , location : new google.maps.LatLng(71.3868712,-129.9945562) },
{ weight : 3 , location : new google.maps.LatLng(37.5933274,139.0009869) }
]
var heatmap = new google.maps.visualization.HeatmapLayer({
data: heatmapData,
radius: 50,
map: map
});
}
google.maps.event.addDomListener(window, 'load', initialize);
</script>
</head>
<body>
<div id="map"></div>
</body>
</html>
Referenced page
Looking at the results, as expected, Nagasaki, Tokyo, which is related to Massan, is the brightest. Hiroshima in "Hiroshima no Sora" and Kyoto / Nara appearing in "Yesterday / Kyo / Nara, Asuka / Tomorrow" and "Shuni-e" are also getting brighter.
When I wondered, "Why is the upper right corner of Hokkaido brighter?", The geocoding result of "Japan" was here ...
If you pull the zoom and look at the world map, you can see that many areas other than Japan are sung. Massan, who should be a veteran with 42 years of experience as a singer, is a terrifying global talent.
The white snow of Kilimanjaro in "Lion Standing in the Wind" that makes me cry every time I listen to a live song The Alaska that appears in "Aurora" and "Byakuya no Tasumi no Hikari" sung with the theme of a real photographer is also faintly colored.
It's a bit off the topic, but if you listen to "Aurora" and "Byakuya no Tasugi no Hikari" while watching the winter night sky, you can cry very much, so if you've never heard of it, please take this opportunity to master it. "Aurora" and "Byakuya no Tasumi" -Masashi Sada, Mitsuho Agishi, and Michio Hoshino
It is brighter around France and Germany, but it seems to be brighter than expected. This was because the words "Buddha" and "Germany" were interpreted as regions when the part of speech was decomposed, so I think that accuracy can be improved by adding a dictionary.
Masashi Sada x IT Advent Calendar The organizer wrote on the 4th day of "Easy part-speech decomposition of Masashi Sada using kuromoji" It's embarrassing because the result of the heat map and the part of the content is thin.
I want to actually try it and reflect the Sky Tree and Shinji Pond on the display & some geocoding results with Google Maps API are "?", So take time to prepare and study again. I wanted to try making a heat map again.
If you get an interesting result, I'd like to post a postcard to the raw ...