Do you all enjoy your self-restraint life? I can't enter university, so I spend my time studying programming at home, which has nothing to do with research. ~~ More fun than research ... ~~
Also, I have more time to read. Even if you say reading, there is a high percentage of people reading to become a novelist.
So, in order to make such a self-restraint life as comfortable as possible
** You can read several episodes of the work of becoming a novelist in vertical writing **
I made such a Web API.
The Web API created this time is
-------- Naro Novel API --------
** Search for novels with specified words ** ↓ ** Select the work with the highest total points and get NCODE ** ↓ -------- Naro Novel API -------- ↓ ** Scrap the story in the specified range ** ↓ ** Vertically formatted into novel-like HTML ** ↓ ** Provided **
I am processing in the flow.
When accessed from a smartphone, it looks like the following. You can read it by scrolling horizontally. (From 10 episodes of unemployed reincarnation)
From PC
Python
First from Python
Scraping can be overcome with just requests
and re
, and the server part uses Flask
.
# -*- coding: utf-8 -*-
from flask import Flask
from requests import get
import re
app = Flask(__name__)
def narou_html(keyword, num=1, pivot='e'):
honbun_ = ""
item = get(
f"https://api.syosetu.com/novelapi/api/?out=json&of=t-n&lim=1&order=hyoka&word={keyword}"
).json()[1]
#item = max(items[1:], key=lambda x: x["global_point"]) #Remnants of an era when you didn't know you could hit the API in a sorted form
url = "https://ncode.syosetu.com/"
ncode = item["ncode"]
headers = {
'User-Agent':
'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'
}
t = get(url + ncode, headers=headers).text
slist = []
sl = re.findall('<dl class="novel_sublist2">(.+?)</dl>',
t.replace("\n", ""))
for i in sl[-num:] if pivot == 'e' else sl[:num +
1] if pivot == 's' else sl[
int(pivot) - 1:int(pivot) +
num - 1]:
groups = re.search(
'<a href="/(.+?)/">(.+?)</a></dd><dt class="long_update">(.+?)[(?:<span)|(?:</dt>)]',
i)
slist.append({
"title": groups.group(2),
"href": groups.group(1),
"long_update": groups.group(3)
})
for s in range(len(slist)):
t = get(url + slist[s]["href"], headers=headers).text
honbun = re.search(
'<p class="novel_subtitle">(?:.|\s)+?<div id="novel_honbun"(?:.|\s)+?</p>\n</div>',
t).group(0)
for j, i in enumerate(honbun.splitlines()):
if i[:2] == "<p":
groups = re.search("(<p.*?>)(.+?)(</p>)", i)
i = re.sub('(\d{1,4})',
'<span class="text-combine">\\1</span>',
groups.group(2))
if i[0] in ('(', '「', '(', '『',
'【') and honbun_[-9] not in (')', ')', '」', '』',
'】'):
i = '<br><br>' + i
if i[-1] in (')', ')', '」', '』', '】'):
i += '<br><br>'
if i[0] == '・': i = '<br>' + i + '<br>'
elif j > 1 and i[0] != '<' and honbun_[-1] != '>':
if re.match("\s", i):
i = i[1:]
elif not re.match('\s', i) and i[0] not in ('(', '「', '(', '『',
'【'):
i = " " + i
if j == 0:
i = '<h3>' + i + '</h3>\n<br><br>'
honbun_ += i
honbun_ += "<br>" * 25 + "\n"
honbun_ = '''<html>
<head>
<meta name="viewport" content="initial-scale=1.3,minimum-scale=1.3">
<style>
body {
margin-top: 3.5%;
margin-bottom: 3%;
white-space: break-all;
-ms-writing-mode: tb-rl;
writing-mode: vertical-rl;
text-orientation: upright;
font-family: "Noto Serif JP", serif;
font-size: 85%;
}
.text-combine {
-webkit-text-combine: horizontal;
-ms-text-combine-horizontal: all;
text-combine-upright: all;
}
</style>
<title>''' + item["title"] + '''</title>
<link href="https://fonts.googleapis.com/css?family=Noto+Serif+JP&display=swap&subset=japanese" rel="stylesheet">
</head>
<body>
''' + honbun_[:-93].replace('→', '↓').replace('-', '|').replace("-", "|") + '''</body>
</html>'''
return honbun_
@app.route("/")
def hello_world():
return "Usage: http://ip-address:port/title/pivot/num pivot - s: from the first. e: from the end. m: pick up one."
@app.route('/<keyword>/<pivot>/<num>')
def html_(keyword, pivot, num):
html = narou_html(keyword, int(num), pivot)
return html
if __name__ == "__main__":
app.run(host="0.0.0.0")
(If you use a lot of regular expressions, the color coding of Qiita will collapse due to some backslash and it will be sad) Since the shaping is made to look good with a sense, sometimes sentences that should be broken are connected.
After execution
http: // localhost: 5000 / search word / described later / described later
(only from the execution terminal) orhttp: // local IP address of the execution terminal: 5000 / search word / described later / described later
You can read it at.
The part of described later
is
For the former, from the first episode with s
, from the latest episode with ʻe, from the part specified by
number`,
What's behind, how many stories to return
You can specify.
GAS(Google Apps Script)
I tried to upload it to Heroku to make it a LINE bot, but it seemed that I could not scrape to become a novelist from Heroku's server, so I rewrote it with GAS. I did. I don't write much Javascript itself, but on top of that, the code, such as GAS's own functions, was short, but it was a lot of trial and error. Especially, I usually run Jupyter immediately! Immediate response! It was fatal that the log could not be displayed properly because I got used to it.
Using two projects to use as a LINE bot, One is for LINE response, The second is for the Web API similar to the Python code above It was made.
When you send a message, the first project will reply with a URL, and when you access that URL, the second project will provide you with a formatted page of the novel. It is a flow. GAS is amazing that you can do this for free. But it's slow.
The first one just returns the URL, so I decided to have a look at the site that explains how to make a LINE bot with GAS, and I will post only the second code. Also, here it is simplified from the Python version, such as implementing only from the latest story to returning the specified number of stories (rather, the Python side was improved after writing this).
function doGet(e) {
var keyword = e.parameter.keyword; //Search word
var num = parseInt(e.parameter.num); //Number of stories
var getUrl = "https://api.syosetu.com/novelapi/api/?out=json&of=t-n&lim=1&order=hyoka&word="+keyword;
var response = UrlFetchApp.fetch(getUrl).getContentText('UTF-8');
var json = JSON.parse(response)[1];
var title = json["title"]
var ncode = json["ncode"]
getUrl = "https://ncode.syosetu.com/"+ncode;
response = UrlFetchApp.fetch(getUrl).getContentText('UTF-8').replace(/[\r\n]+/g,"");
var items = response.match(/<dl class="novel_sublist2">(.+?)<\/dl>/gm);
items = items.slice(items.length-num,items.length);
var slist = [];
for (var i = 0;i<num;i++){
slist.push(items[i].match(/(<a href="\/)(.+?)(\/">)/)[2]); //Collect hrefs for specified stories
}
var honbun = ""
//Below shaping time
for (var s = 0;s<num;s++){
getUrl = "https://ncode.syosetu.com/"+slist[s];
response = UrlFetchApp.fetch(getUrl).getContentText('UTF-8');
var honbun_ = response.match(/<p class="novel_subtitle">(?:.|\s)+?<div id="novel_honbun"(?:.|\s)+?<\/p>\n<\/div>/)[0];
var sphon=honbun_.split(/[\r\n]+/g);
for (var i = 0, len=sphon.length;i<len;i++){
if (sphon[i].slice(0,2) == "<p"){
var groups = sphon[i].match(/(<p.*?>)(.+?)(<\/p>)/);
var temp = groups[1] + groups[2].replace(/(\d{2,4})/g, '<span class="text-combine">$1<\/span>') + groups[3];
if(i == 0){
temp = '<h3>' + temp + '</h3>';
Logger.log(temp);
}
honbun += temp + "\n"
}
else{honbun += sphon[i] + "\n";}
}
honbun += "</br></br>";
}
honbun = '<html>\n<head>\n <title>' + title + '</title>\n</head>\n<font size="5"><style>\n body {\n -ms-writing-mode: tb-rl;\n writing-mode: vertical-rl;\n text-orientation: upright;\n font-family: "Yu Mincho", YuMincho, "Hiragino Mincho ProN W3", "Hiragino Mincho ProN W3", "Hiragino Mincho ProN", "HG Mincho E", "MS P Mincho", "MS Mincho", serif;\n }\n\n .text-combine {\n -webkit-text-combine: horizontal;\n -ms-text-combine-horizontal: all;\n text-combine-upright: all;\n }\n</style>\n\n<body>\n' + honbun + ' \n</body>\n</font>\n</html>'
var html = HtmlService.createHtmlOutput(honbun); //I'm not sure. A preparatory guy to display HTML in the state of a string as HTML?
return html;
}
Published → Introduced as a web application
And the generated address + ? Keyword = search word & num = number of stories
By accessing, a vertically formatted page will be displayed. (The LINE bot will reply to this address in the first project.)
Reading progresses ... HTML is an amateur, so if you don't like it, modify it ... …… When using it, please use it with moderation so as not to put a heavy burden on the server to become a novelist.
Recommended Posts