Convert markdown to PDF in Python

What i did

In Python, I scraped a web page containing markdown notation, output it to markdown, and then converted the markdown to HTML and then to PDF. (Complicated) I used Beautiful Soup for scraping, but I will omit it this time. I didn't have much information about Markdown → HTML → PDF, so I summarized it.

Markdown → HTML used Markdown, HTML → PDF used pdfkit.

environment

The environment is as follows.

Windows10
wkhtmltopdf
Python3.8
Markdown==3.2.1
pdfkit==0.6.1
Pygments==2.6.1

In case of Windows environment, it is necessary to install wkhtmltopdf before using pdfkit. Install the 64-bit version from this site. You can use the path, or you can specify the path of the executable file directly in the Python code as you do this time.

We also use Pygments to highlight the code blocks.

The folder structure is as follows.

   app
    |- file
    |   |- source
    |   |   └─ source.md ← Original markdown
    |└─ pdf ← PDF output destination
    |- app.py
    └─requirements.txt

Install each library.

`requirements.txt`


Markdown==3.2.1
pdfkit==0.6.1
Pygments==2.6.1

> pip install -r requirements.txt

The markdown that is the conversion source is as follows.

`source.md`


#title

##subtitle

Markdown text.

###list

1.Numbered list
    1.Nested numbered list
1.Numbered list

###text

this is*italic*It is a description of.
this is**Emphasis**It is a description of.
this is[Link](https://qiita.com/)It is a description of.

###Source code

- Java

\```java

class HelloWorld {

    public static void main(String[] args) {
        System.out.println("Hello World");
    }
}

\```

- Python

\```python

class HelloWorld:
    def __init__(self, id, name):
        self.id = id
        self.name = name

if __name__=='__main__':
    hello = HelloWorld(1, 'python')

\```

###Ruled line

***

---

* * *

###table

| Table Header | Table Header | Table Header |
| :-- | :--: | --: |
| Body | Body | Body |
| Left | Center | Right |

Markdown → HTML

Use the Markdown library to convert Markdown to HTML. Enable the extension codehilite and highlight the source code with Pygments.

`app.py`


import markdown
from pygments import highlight
from pygments.formatters import HtmlFormatter

def mark_to_html():
    #Read markdown file
    f = open('file/source/source.md', mode='r', encoding='UTF-8')
    with f:
        text = f.read()
        #Create stylesheets for highlights with Pygments
        style = HtmlFormatter(style='solarized-dark').get_style_defs('.codehilite')
        # #Markdown → HTML conversion
        md = markdown.Markdown(extensions=['extra', 'codehilite'])
        body = md.convert(text)
        #Fit to HTML format
        html = '<html lang="ja"><meta charset="utf-8"><body>'
        #Import stylesheets created with Pygments
        html += '<style>{}</style>'.format(style)
        #Add style to add border to Table tag
        html += '''<style> table,th,td { 
            border-collapse: collapse;
            border:1px solid #333; 
            } </style>'''
        html += body + '</body></html>' 
        return html

Markdown has the following extensions that you can specify as a list when creating an object.

md = markdown.Markdown(extensions=["Extensions"])

Only the ones that can be used are excerpted. For all extensions below.

https://python-markdown.github.io/extensions/#officially-supported-extensions

Expansion	function
extra	Convert basic markdown notation such as abbreviation elements, lists, code blocks, citations, tables to HTML
admonition	A note can be output
codehilite	You can add syntax highlighting defined in Pygments to code blocks. Requires Pygments.
meta	You can get the meta information of the file
nl2br	One line feed code`<br>`Convert to tag
sane_lists	Supports lists with line breaks and numbered lists from specified numbers
smarty	`"`,`>`Supports HTML special characters such as
toc	Automatically create a table of contents from the composition of headings
wikilinks	`[[]]`Corresponds to the link notation in

You can also use third-party extensions.

https://github.com/Python-Markdown/markdown/wiki/Third-Party-Extensions

The Pygments that I'm trying to highlight in the code specifies the style and class names to apply.

style = HtmlFormatter(style="Style name").get_style_defs("name of the class")

Applicable styles can be obtained from the Pygments command line tool.

> pygmentize -L styles

The class name will be output as <code class =" codehilite "> when codehilite is enabled in markdown, so set it to codehilite.

HTML → PDF

We use a library called pdfkit to convert from HTML to PDF. Since this library uses wkhtmltopdf internally, you need to put the path in the environment variable or specify the path of the executable file at runtime.

`app.py`


import pdfkit

def html_to_pdf(html:str):
    """
    html : str HTML
    """
    #Specifying the output file
    outputfile = 'file/pdf/output.pdf'
    #Specifying the path of the wkhtmltopdf executable file
    path_wkhtmltopdf = r'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe'
    config = pdfkit.configuration(wkhtmltopdf=path_wkhtmltopdf)
    #Perform HTML → PDF conversion
    pdfkit.from_string(html, outputfile, configuration=config)

This time, the MTML format string is converted to a PDF file, but you can also convert the website to PDF by specifying the URL. In that case, use the from_url function. Also, if you want to convert an HTML file to PDF, use from_file.

Source code

Overall source code

import markdown
import pdfkit
from pygments import highlight
from pygments.formatters import HtmlFormatter

def mark_to_html():
    #Read markdown file
    f = open('file/source/source.md', mode='r', encoding='UTF-8')
    with f:
        text = f.read()
        #Create stylesheets for highlights with Pygments
        style = HtmlFormatter(style='solarized-dark').get_style_defs('.codehilite')
        # #Markdown → HTML conversion
        md = markdown.Markdown(extensions=['extra', 'codehilite'])
        body = md.convert(text)
        #Fit to HTML format
        html = '<html lang="ja"><meta charset="utf-8"><body>'
        #Import stylesheets created with Pygments
        html += '<style>{}</style>'.format(style)
        #Add style to add border to Table tag
        html += '''<style> table,th,td { 
            border-collapse: collapse;
            border:1px solid #333; 
            } </style>'''
        html += body + '</body></html>' 
        return html

def html_to_pdf(html: str):
    #Specifying the output file
    outputfile = 'file/pdf/output.pdf'
    #Specifying the path of the wkhtmltopdf executable file
    path_wkhtmltopdf = r'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe'
    config = pdfkit.configuration(wkhtmltopdf=path_wkhtmltopdf)
    #Perform HTML → PDF conversion
    pdfkit.from_string(html, outputfile, configuration=config)

if __name__=='__main__':
    html = mark_to_html()
    html_to_pdf(html)

Conversion result

The result of executing the above function and converting it is as follows.

The output result of markdown can be converted to PDF. The code block is also highlighted. I didn't include it in the example this time, but you can output the image without any problem.

Summary

I used the Python libraries Markdown and pdfkit to convert markdown to PDF. If you create a template, you can convert the minutes to PDF immediately, which is convenient. I hope you can use it as a material for improving work efficiency.