I found an interesting command while reading Narurou Novel API, so I will introduce and analyze it.
|Parameters|value|Description| |:--|:--|:--| |kaiwaritu |int string |The conversation rate of the novel to be extracted%It can be specified in units. When specifying a range, hyphen the minimum and maximum numbers(-)Separate with a symbol.
I see. Conversation rate …… I wonder if it's just conversation or the part of the ground
Then immediately
Prepare for loading and load the library
before_load.py
import pandas as pd
import requests
import numpy as np
import seaborn as sns
from scipy import stats
import matplotlib.pyplot as plt
%matplotlib inline
url = "http://api.syosetu.com/novelapi/api/"
narou_load.py
st = 1
lim = 500
data = []
while st < 2000:
payload = {'of': 't-gp-gf-n-ka', 'order': 'hyoka',
'out':'json','lim':lim,'st':st}
r = requests.get(url,params=payload)
x = r.json()
data.extend(x[1:])
st = st + lim
df = pd.DataFrame(data)
df.head()
payload = {'of': 't-gp-gf-n
You can load it by adding a part called ka
to this part. (Added above)
And the data that comes out
title | kaiwaritu(%) |
---|---|
When I was reincarnated, it was slime | 14 |
The strongest in the world in a common profession | 40 |
Wandering in another world with ridiculous skill | 36 |
Mushoku Tensei-If you go to another world, you will get serious- | 22 |
Another world fantasy song starting from Death March (web version) | 38 |
I see. It's quite expensive (fan)
However, I don't know how expensive this is in the first place, so try describe ()
kaiwaritu | |
---|---|
count | 2000.00000 |
mean | 38.00800 |
std | 10.66831 |
min | 0.00000 |
25% | 31.00000 |
50% | 38.00000 |
75% | 45.00000 |
max | 96.00000 |
I see. Is it about the average when the average is 38%? Or rather, the number of characters is so large that it is quite common?
Let's narrow down the number of characters a little.
I dare to use the reading time without specifying the number of characters But what is the reading time?
|Parameters|value|Description| |:--|:--|:--| |time|int string|You can specify the reading time of the novel to be extracted. The reading time is the number of characters in the novel divided by 500. When specifying a range, hyphen the minimum and maximum characters(-)Separate with a symbol.|
As you can see, the number is proportional to the number of characters, so there should be no problem except that the number becomes smaller.
Add ti
to ʻof of
payload` and load immediately
Since it's a big deal, try describe ()
on time
time | |
---|---|
count | 2000.000000 |
mean | 1395.985500 |
std | 1823.680635 |
min | 11.000000 |
25% | 434.750000 |
50% | 889.500000 |
75% | 1608.250000 |
max | 26130.000000 |
It seems that there are at least 5001 characters.
(... I don't think max is Summoner)
df[['title','time']].sort_values('time').tail()
title | time |
---|---|
Magi Craft Meister | 14868 |
Boundary Labyrinth and the Wizard of the Other World | 16410 |
Cooking with Wild Game | 17653 |
Summoner goes | 25536 |
legend | 26130 |
** No **
doku_kai.py
#Quartile in time
df['part']=pd.qcut(df.time,4,labels=['D','C','B','A'])
#Average for each part
df.groupby('part').agg({'kaiwaritu':['mean']})
part | kaiwaritu(average:%) |
---|---|
D | 36.990 |
C | 38.180 |
B | 38.322 |
A | 38.540 |
This was a surprise. The conversation rate does not seem to change, especially whether it is a long story or a short story.
I was disappointed, so I tried using another stylistic function. This seems to be still in the trial stage, and there are cases where data is not clearly output (it is ambiguous in the first place), and since it can not be set to ʻof`, I will make two types of data frame reading
|Parameters|value|Description| |:--|:--|:--| |buntai |int string|You can specify the style. hyphen(-)You can perform an OR search by separating them with a symbol. 1: Work that is not indented and has many continuous line breaks 2: Work that is not indented but has an average number of line breaks 4: Work that is appropriate for indentation but has many continuous line breaks 6: Work that is appropriate for indentation Works with an average number of line breaks|
First, divide into df1
, df2
, df4
, and df6
, respectively.
The strongest sage of disqualification crest-The strongest sage in the world has reincarnated to become stronger- Duke's daughter's taste Another world life of a reincarnated sage-I got a second profession and became the strongest in the world- I have reincarnated as a villain daughter who has only the ruin flag of the maiden game ... Live dungeon!
Isekai Shokudo Someone please explain this situation Hariko Maiden I will quietly disappear Mid-career (middle-aged) office worker relaxing different world industrial revolution
The strongest in the world in a common profession Mushoku Tensei-I'm serious when I go to another world- Another world fantasy song starting from Death March (web version) Re: Life in a different world starting from zero I want to be a powerful person in the shadow![Web version]
When I was reincarnated, it was slime Wandering in another world with ridiculous skill I said that the ability is an average value! It's a spider, but what is it? The magical power of the saint is versatile
There are some classifications that I don't understand, but I'll put up with it here.
df1 | df2 | df4 | df6 | |
---|---|---|---|---|
count | 500.000000 | 500.000000 | 500.00000 | 500.000000 |
mean | 36.506000 | 35.246000 | 38.74200 | 37.668000 |
std | 11.489211 | 14.927396 | 9.70091 | 13.106691 |
min | 1.000000 | 0.000000 | 6.00000 | 0.000000 |
25% | 28.000000 | 25.000000 | 32.75000 | 30.000000 |
50% | 36.000000 | 35.000000 | 39.00000 | 38.000000 |
75% | 44.000000 | 44.250000 | 45.00000 | 46.000000 |
max | 70.000000 | 98.000000 | 71.00000 | 96.000000 |
Looking at this result, although there was no big difference, df2 was small overall, and df6 was large. The population parameter is set to 500 each because the initial parameter was 2000, and when displayed in 2000 parameters, df2 dropped further to 34%.
Looking at this, the conversation rate does not seem to be related to the writing style. ~~ I wonder if it's a genre ~~
The analysis result did not go very well, but I wondered if it was a practice for my future work. If I come up with an interesting data analysis, I would like to try it. When I read it back, I was surprised at the low conversation rate of Tosura. Is it because there are many conversations in my heart?
Recommended Posts