https://qiita.com/dely13/items/5e949a384161c961d8ce If you read this article and try it yourself after practicing ~~ play ~~, the result will be different → This article is 2017 So I tried to put out the latest (as of 10:00 on June 29, 2020)
I will use @ dely13's article as it is
dely13.py
import pandas as pd
import requests
import numpy as np
import seaborn as sns
from scipy import stats
import matplotlib.pyplot as plt
%matplotlib inline
url = "http://api.syosetu.com/novelapi/api/"
#Specify API parameters in the dictionary
#Under this condition, output json format data in the order of comprehensive evaluation
payload = {'of': 't-gp-gf', 'order': 'hyoka','out':'json'}
st = 1
lim = 500
data = []
while st < 2000:
payload = {'of': 't-gp-gf-n', 'order': 'hyoka',
'out':'json','lim':lim,'st':st}
r = requests.get(url,params=payload)
x = r.json()
data.extend(x[1:])
st = st + lim
df = pd.DataFrame(data)
#Preprocessing('year'Add column,'title_len'Add column)
df['general_firstup'] = pd.to_datetime(df['general_firstup'])
df['year'] = df['general_firstup'].apply(lambda x:x.year)
df['title_len'] = df['title'].apply(len)
Please read the original article for details as it is really as it is
In 2017
Interesting numbers. The average value is 17 characters, which is the same as the number of characters in haiku. In other words, the title of Naruro was haiku! Frog Poem and the sound of water jumping into a frog ...
I was told, but in 2020 ...?
df['title_len'].hist()
df['title_len'].describe()
Histogram diagram df ['title_len'] .hist ()
Data df ['title_len'] .describe ()
count 2000.000000 mean 24.179500 std 15.528356 min 2.000000 25% 12.000000 50% 21.000000 75% 32.000000 max 100.000000 Name: title_len, dtype: float64
Wwwwwww which increases 7 characters on average
per_year.py
title_by_year = df.groupby('year')['title_len'].agg(['mean','count','std']).reset_index()
#plot
title_by_year.plot(x='year',y='mean')
#data
title_by_year
Plot title_by_year.plot (x ='year', y ='mean')
* mean
= average
Aggregate title_by_year
year | mean | count | std |
---|---|---|---|
2008 | 7.500000 | 2 | 2.121320 |
2009 | 12.428571 | 7 | 8.182443 |
2010 | 10.882353 | 17 | 5.278285 |
2011 | 10.180000 | 50 | 4.684712 |
2012 | 13.294737 | 95 | 6.963237 |
2013 | 14.115942 | 138 | 8.541930 |
2014 | 16.065476 | 168 | 8.780176 |
2015 | 18.218009 | 211 | 9.701245 |
2016 | 21.577358 | 265 | 12.326472 |
2017 | 24.476015 | 271 | 11.750113 |
2018 | 29.425856 | 263 | 13.890288 |
2019 | 31.327327 | 333 | 15.861156 |
2020 | 40.483333 | 180 | 22.348053 |
** The title of 2019 will be Tanka ** The person who guessed in the 2017 article is amazing. It's Don Pisha.
Since it's a big deal, I'll try to find the maximum and minimum
title_by_year = df.groupby('year')['title_len'].agg(['mean','min','max']).reset_index()
#plot
title_by_year.plot(x='year')
#data
title_by_year.plot
Plot title_by_year.plot (x ='year')
Data title_by_year
year | mean | min | max |
---|---|---|---|
2008 | 7.500000 | 6 | 9 |
2009 | 12.428571 | 5 | 25 |
2010 | 10.882353 | 2 | 23 |
2011 | 10.180000 | 4 | 26 |
2012 | 13.294737 | 3 | 40 |
2013 | 14.115942 | 3 | 54 |
2014 | 16.065476 | 4 | 63 |
2015 | 18.218009 | 3 | 59 |
2016 | 21.577358 | 2 | 77 |
2017 | 24.476015 | 4 | 69 |
2018 | 29.425856 | 5 | 74 |
2019 | 31.327327 | 4 | 100 |
2020 | 40.483333 | 4 | 100 |
Isn't this 100-character data exceeding the number of characters?
max_100.py
df[['ncode','title','year','title_len']].set_index('ncode').query('title_len==100')
ncode | title | year | title_len |
---|---|---|---|
N7855GF | I was treated as incompetent and was banished from my childhood friend party. I made full use of the gift "Translation".... | 2020 | 100 |
N6203GE | A blacksmith who was exiled from the dictatorship, in fact, with the protection of "Blacksmith Goddess", suddenly with "Super Legendary" armor full equipment... | 2020 | 100 |
N0533FS | [Series version] I witnessed the chasing idol walking with a handsome guy, so I bought a part-time job... | 2019 | 100 |
N4571GF | In the 7th week of the loop, I learned that I was fitted with my believing friends, so I actively partyed on the 8th lap.... | 2020 | 100 |
... this isn't over 100 characters ...?
When I looked it up after writing the article, it was exactly 100 characters
Is there a character limit? That's what I'm fighting at the limit.
On the contrary, I was interested in short titles
mini_len.py
df.groupby('title_len')['title_len'].agg(['count']).head(9).T
List of correspondence between the number of characters and the number of works Since it has become longer, it is placed horizontally
title_len | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|
count | 2 | 8 | 18 | 35 | 41 | 38 | 64 | 75 | 89 |
title2_4.py
df[['title','year','title_len']].set_index('title').sort_values('title_len').query('title_len<5')
4 characters are excerpts
title | year | title_len |
---|---|---|
letter | 2016 | 2 |
dawn | 2010 | 2 |
Bow and sword | 2013 | 3 |
The reason for water | 2012 | 3 |
Tomb King! | 2013 | 3 |
Childhood friend | 2016 | 3 |
Searcher | 2013 | 3 |
The shadow of the tower | 2012 | 3 |
Extermination person | 2015 | 3 |
Cat and dragon | 2013 | 3 |
Oblivion saint | 2020 | 4 |
J/53 | 2012 | 4 |
Black Demon King | 2011 | 4 |
My servant | 2019 | 4 |
Mob love | 2015 | 4 |
Wise man's grandson | 2015 | 4 |
Seventh | 2014 | 4 |
Even a few letters are famous. I was impressed that the former Moba people had a "title" in the four letters.
Is it the influence of the animation of mobile novels that many beginners enter, if not as much as Moba (currently Ebu)? I was trained by Moba, so even if it's a little difficult to read, I'll read it if the content is interesting, but even so, the title is long. I'm addicted to this and this, which are rather long titles. (Ebudato this * Stemmer)
I wanted to try various things because I can narrow down the search conditions with the Naro API. What if you want to extract more than 2000 items ...
Recommended Posts