This is used when you want to copy and paste a paper from pdf and translate it into Google. Open the output conv.html and right-click to translate. I needed it in a hurry, so I made it insanely suitable. There are no plans for maintenance.
"""
This is used when you want to copy and paste a paper from pdf and translate it into Google.
.Given the path of txt_conv.Since it will be output in html, open it in chrome and right-click to translate.
・ What you are doing
ctrl in pdf+If you paste it with A, line breaks will be wasted and it will not translate well.
Avoid it.
"""
import sys
import os
filepath = sys.argv[1]
# ---Process text for Google Translate
with open(filepath, "r", encoding="utf-8") as f:
lines = f.readlines()
newlines = []
lenbuf = [] #line len buffer
for line in lines:
if len(line) >= 2:
if line[-2] == ".": #Insert a line break if the end is a period
line = line[:-1] + "<br>"
else: #If not, put a space
line = line[:-1] + " "
newlines.append(line)
lenbuf.append(len(line))
f.close()
# ---If it is better to insert a line break, insert a line break(average line length*0.Set 8 as the threshold)
ave_len = sum(lenbuf) / len(lenbuf)
for n, linelen in enumerate(lenbuf):
if linelen < ave_len * 0.8:
newlines[n] = newlines[n] + "<br>"
# ---_Save with conv
savepath = os.path.splitext(filepath)[0] + "_conv.html"
with open(savepath, "w", encoding="utf-8") as f:
f.writelines(newlines)
Recommended Posts