Rakuten Kobo and iBooks support MathML, and rendering is quite good, but the largest Amazon Kindle does not (maybe it is cost-effective, I do not think it will be supported in the future). , In order to make a reflow-type math-like Kindle book, it is unwilling to image mathematical formulas.
I'd like to use svg, which is a vector image, at least when converting mathematical formulas to images, but I'm not sure if Kindle supports it or not (Is it not supported by iOS apps?). , It seems safe to make it png quietly. However, it is not possible to manually image all documents such as tex containing hundreds of mathematical formulas and format them into formats such as epub3, so the purpose of this article is to automate. ..
The environment is macOS. Please note that the sense of variable naming and the handling of directories etc. may be bare amateurs. It may be rewritten a little smarter later.
import subprocess, lxml.html, lxml.etree, imagesize, re, tempfile
def tex2html(source, target): #Please change the pandoc options accordingly
subprocess.call(['pandoc',
'-s',
'-t', 'html5',
'--mathjax',
'--css=stylesheet.css',
'-o', target, source])
def get_html(filename, encoding='utf-8', xml=False):
#Lxml when parsing xhtml.When parsing etree and html, lxml.Use html
with open(filename, 'r', encoding=encoding) as f:
if xml == True:
html = lxml.etree.parse(f).getroot()
else:
html = lxml.html.parse(f).getroot()
return html
def write_html(html, filename, encoding='utf-8', xml=False):
#Lxml when parsing xhtml.When parsing etree and html, lxml.Use html
if xml == True:
src = lxml.etree.tostring(html,
encoding=encoding,
xml_declaration=True,
doctype='<!DOCTYPE html>',
method='xml',
pretty_print=True).decode(encoding)
else:
src = lxml.html.tostring(html,
encoding=encoding,
doctype='<!DOCTYPE html>',
pretty_print=True).decode(encoding)
with open(filename, 'w', encoding=encoding) as f:
f.write(src)
def convert2png(filename, newfilename, convert=True):
html= get_html(filename)
textemplate = r'''\documentclass[a0paper, uplatex]{jsarticle}
\usepackage[dvipdfmx]{graphicx}
\usepackage[margin=1cm]{geometry}
\usepackage{amsmath,amssymb}
\pagestyle{empty}
\begin{document}
\scalebox{4}{\parbox{.25\linewidth}{MATH}}
\end{document}
'''
imgs = {} #For recording the tex code and image file so as not to make the same image
for span in html.xpath(r'//span[@class="math inline" or @class="math display"]'):
tex = span.text
if tex in imgs:
imgsrc = imgs[tex]
else:
imgsrc = r'math{0:04d}.png'.format(len(imgs)+1)
imgs[tex] = imgsrc
if convert == True:
with open('tmp.tex', 'w') as texf:
texf.write(textemplate.replace('MATH', tex))
subprocess.call('uplatex tmp.tex', shell=True)
subprocess.call('dvipdfmx tmp.dvi', shell=True)
subprocess.call('convert -trim tmp.pdf '+imgsrc, shell=True)
span.tag = 'img'
span.text = None
span.attrib['src'] = imgsrc
span.attrib['alt'] = tex
width, height = imagesize.get(imgsrc)
span.attrib['height'] = str(height)
write_html(html, newfilename)
def html2epub(source, target, css, cover):
subprocess.call(['pandoc',
'-t', 'epub3',
'--toc',
'--epub-chapter-level=1',
'--epub-stylesheet', css,
'--epub-cover-image', cover,
'-o', target, source])
def extract_epub(filename, tmpdir):
subprocess.call('mkdir {0}'.format(tmpdir), shell=True)
subprocess.call('unzip -d {0} {1}'.format(tmpdir, filename), shell=True)
def make_epub(filename, tmpdir):
subprocess.os.chdir(tmpdir)
subprocess.call(r'zip -0 ../{filename} mimetype;zip -XrD ../{filename} *'.format(filename=filename, tmpdir = tmpdir),shell=True)
subprocess.os.chdir('../')
def make_epub_for_kindle(name, css, cover, convert=True):
tmpdir = name + '_tmpdir'
if not subprocess.os.path.isdir(tmpdir):
subprocess.os.mkdir(tmpdir)
subprocess.os.chdir(tmpdir)
tex2html('../'+name+'.tex', name+'.html')
convert2png(name+'.html', name+'2.html', convert)
html2epub(name+'2.html', name+'0.epub', '../'+css, '../'+cover)
epubdir = tempfile.TemporaryDirectory(dir='./')
extract_epub(name+'0.epub', epubdir.name)
ns = {'xhtml': 'http://www.w3.org/1999/xhtml'}
for filename in [fn for fn in subprocess.os.listdir(epubdir.name) if re.match(r'ch[\d]{3}.xhtml', fn)]:
xhtml = get_html(epubdir.name+'/'+filename, xml=True)
xhtml.attrib['{http://www.idpf.org/2007/ops}lang'] = 'ja'
xhtml.attrib['lang'] = 'ja'
for img in xhtml.xpath('//xhtml:img', namespaces=ns):
height = int(img.attrib['height'])
img.attrib['style'] = 'height: ' + str(round(height/40, 2)) + 'em;'
del img.attrib['height']
write_html(xhtml, epubdir.name+'/'+filename, xml=True)
make_epub(name+'.epub', epubdir.name)
epubdir.cleanup()
subprocess.os.chdir('../')
tex2html(source, target)
Use pandoc
to convert TeX files to HTML5. With the --mathjax
option, the formula part is
<span class="math inline">\(e^{\pi i} = -1\)</span>
<span class="math display">\[e^{\pi i} = -1\]</span>
It is output in the form of. This part is extracted with lxml
and imaged sequentially.
get_html(filename, encoding='utf-8', xml=False)
Parses the HTML document with lxml
and returns the root element html
element. HTML is assumed by default, but if you want to parse XHTML, add the option xml = True
.
write_html(html, filename, encoding='utf-8', xml=False)
Write an HTML file with the html
element as an argument.
convert2png(filename, newfilename, convert=True)
In an HTML document
<span class="math inline">\(e^{\pi i} = -1\)</span>
Image magick's convert
to image such parts
<img src="math0001.png " class="math inline" alt="\(e^{\pi i} = -1\)" height="30">
We will replace it with the ʻimgelement of the form. The
height attribute is a fairly important value used to adjust the height of the last formula. This is picked up by a python library called ʻimagesize
. If there are many formulas, it will take time to convert. If you set convert = False
, the image conversion procedure will be skipped. If you do not need to convert again, such as when reworking, setting it to False
will save time.
By the way, the generated image is made a little larger (4 times the initial setting, maybe about 40pt). If you create an image with the default settings, it will be crushed and unreadable, so we use a method of making it larger and reducing it.
By the way, when I try to image tex, I get a hit called dvipng
, but it seems to be troublesome to support Japanese conversion, so how to convert pdf to png with convert
of imagemagick Is easy.
html2epub(source, target, css, cover)
Convert the HTML document with the image of the formula to epub with pandoc. css
is the file name of the stylesheet to be embedded, and cover
is the file name of the cover image to be embedded.
extract_epub(filename, tmpdir)
Unzip to tmpdir
to fix the epub created by html2epub
. I think that epub3 is relatively famous as a zip file.
make_epub(filename, tmpdir)
Put the reworked files together in epub (I referred to here: Compress to EPUB using terminal).
make_epub_for_kindle(name, css, cover, convert=True)
It is a one-touch version of a series of processes. The name.tex
file in the current directory is used as the source, and finally the name.epub
file is generated. Since many files are generated in the process, it is specified to create a directory called name_tmpdir
and output the generated files there. Place the css
and cover
files in the same directory as name.tex
.
I made epub with pandoc once, unzipped it, * reworked * it, and made it epub again. What the heck is doing with this rework is adjusting the height of the mathematically formulated image. If MathML supports it, you don't have to worry about this, but when it comes to imaging formulas, you need to adjust the width or height, and if you specify absolutely (or not specify), the epub viewer Even if you change the font size with, the size of the formula does not change. To avoid this
<img src="math001.png " style='height: 1.0em;'>
It seems that there is no choice but to specify relative to the style
attribute (please let me know if there is another better way). Here's an excerpt of the code that sets that part:
for img in xhtml.xpath('//xhtml:img', namespaces=ns):
height = int(img.attrib['height'])
img.attrib['style'] = 'height: ' + str(round(height/40, 2)) + 'em;'
Specifically, the unit ʻem` is added to the value obtained by dividing the height of the actual png image by 40. Please adjust in some cases.
If you set the style
attribute at the stage ofhtml2epub ()
, you may think that you do not have to bother to modify it, but when you convert html to epub with pandoc
, There is a Pandoc specification that the style
attribute of the ʻimg` element is deleted.
Well, with that said, there are other elements that modify the ʻepub generated by
pandoc, so I think this decompression and compression work is not wasteful. For example, adding the
lang attribute and the ʻepub: lang
attribute to the html
element is only the timing of this rework. There are quite a few other elements that should be modified (for some people), such as the ʻidattribute of the
section` element.
stylesheet.css
If the class
attribute is math display
, display: block
will be required.
stylesheet.css
img.math.display{
display: block;
margin: .5em auto; /*Centered*/
}
img.math.inline{
margin-left: .2em;
margin-right: .2em;
}
I like the margins. In addition, please adjust the heading and line spacing by yourself.
sample.tex
\documentclass[uplatex]{jsbook}
\usepackage{amsmath,amssymb}
\begin{document}
\title{System of numbers}
\chapter{Natural number}
The smallest set guaranteed by the axiom of infinity$\omega$Then$\emptyset\in\omega$And any$x\in\omega$Against
\begin{align*}
\sigma: x\mapsto x\cup\{x\}
\end{align*}
By$\omega$From$\omega$The mapping to is defined.
\section{Peano's axioms}
$(\omega, \emptyset, \sigma)$Satisfies the so-called Peano axioms.
\chapter{integer}
Two natural numbers$m, n$Against$(m, n)$The integer$m-n$The hint of integer composition is to consider it as.
\section{Equivalence relation}
$(1, 0)$When$(2, 1)$は同じ整数Whenみるので,同値関係が必要になる.
\end{document}
make_epub_for_kindle('sample', 'stylesheet.css', 'cover.png')
This time it is based on TeX files, but it seems that it can be done from trendy markdown. However, since TeX is used for imaging, a TeX environment is essential.
Recommended Posts