Convert from Markdown to HTML in Python

When developing web apps with Django or Bottle, you may want to render Markdown in Python for the text you type. So, this time, I will try to convert it to HTML on the Python side using the OSS package called mistune.

Simple usage

If you follow the sample described in the mistune README, it will be as follows. As a point, in the case of normal text, each paragraph is surrounded by a <p> tag, in which highlights and italics are expanded. In the case of list comprehension, it feels like it is surrounded by the corresponding tags.

In [1]: import mistune

In [2]: mistune.markdown('I am using **mistune markdown parser**')
Out[2]: '<p>I am using <strong>mistune markdown parser</strong></p>\n'
In [3]: txt = """Paragraph 1
   ...: 
   ...:Paragraph 2"""

In [4]: mistune.markdown(txt)
Out[4]: '<p>Paragraph 1</p>\n<p>Paragraph 2</p>\n'

In [5]: txt = """-Listing 1
   ...: -Listing 2
   ...: -Listing 3"""

In [6]: mistune.markdown(txt)
Out[6]: '<ul>\n<li>Listing 1</li>\n<li>Listing 2</li>\n<li>Listing 3</li>\n</ul>\n'

There are also options such as ʻescape(True by default) whether to escape the entered HTML tag andhard_wrap to include the
`tag at line breaks without blank lines. (False by default).

Use your own Lexer and Render

By the way, in mistune, conversion according to the general markdown specifications is provided by default, but in addition, there is also a function to specify rules for unique notations and symbols and convert to HTML.

Although mistune provides http / https linking function as standard, it does not support other protocols such as ftp and smb. Here, as an example, let's add a function to automatically add an a tag to a link.

The method is to write your own rules in Lexer and specify the conversion method corresponding to Renderer. Here, Lexer extracts the URL of the ftp or smb protocol as a regular expression, and Renderer adds <a> tags before and after.

import re
from mistune import Renderer, InlineLexer


class DocumentLinkRenderer(Renderer):

    def document_link(self, link):
        return '<a href="{l}">{l}</a>'.format(l=link)


class DocumentLinkInlineLexer(InlineLexer):

    def enable_document_link(self):
        self.rules.document_link = re.compile(r'''^((https?|smb|ftp|file):\/\/[^\s<]+[^<.,:;"')\]\s])''')
        self.default_rules.insert(3, 'document_link')

    def output_document_link(self, m):
        text = m.group(1)
        return self.renderer.document_link(text)

To use it, specify a custom Renderer and InlineLexer in mistune.Markdown ().

renderer = DocumentLinkRenderer()
inline = DocumentLinkInlineLexer(renderer)
inline.enable_document_link()
markdown = mistune.Markdown(renderer, inline=inline)
    
markdown("markdown text")

When you actually use it, you can see that the link is automatically created for the ftp URL as shown below.

In [20]: markdown("ftp://path/to/file")
Out[20]: '<p><a href="ftp://path/to/file">ftp://path/to/file</a></p>\n'

With this function, you can attach a link to a specific word like Hatena Keyword, or link an article ID and URL like a ticket for Redmine or Backlog.

reference

Recommended Posts

Convert from Markdown to HTML in Python
Convert markdown to PDF in Python
Python
Convert markdown to PDF in Python
Convert from Markdown to HTML in Python
How to easily convert format from Markdown
Convert psd file to png in Python
Convert from katakana to vowel kana [python]
Convert absolute URLs to relative URLs in Python
Changes from Python 3.0 to Python 3.5
Convert FBX files to ASCII <-> BINARY in Python
[Python] Convert from DICOM to PNG or CSV
Convert exponential notation float to str in Python
Convert cubic mesh code to WKT in Python
In Vim: set to output html from markdown using pandoc with make
I made a web application in Python that converts Markdown to HTML
How to convert / restore a string with [] in python
[python] Convert date to string
Post from Python to Slack
To flush stdout in Python
Convert numpy int64 to python int
[Python] Convert list to Pandas [Pandas]
Cheating from PHP to Python
Convert NumPy array "ndarray" to lilt in Python [tolist ()]
Convert HTML to text file
Convert CIDR notation netmask to dotted decimal notation in Python
Login to website in Python
App development to tweet in Python from Visual Studio 2017
OCR from PDF in Python
Anaconda updated from 4.2.0 to 4.3.0 (python3.5 updated to python3.6)
Convert Scratch project to Python
[Python] Convert Shift_JIS to UTF-8
Convert callback-style asynchronous API to async / await in Python
How to download files from Selenium in Python in Chrome
Convert CIDR notation in Python
Switch from python2.7 to python3.6 (centos7)
Speech to speech in python [text to speech]
Convert / return class object to JSON format in Python
Connect to sqlite from python
How to develop in Python
Convert Webpay Entity type to Dict type (recursively in Python)
Rewrite relative links in html to absolute links in python (lxml)
Convert python 3.x code to python 2.x
Post to Slack in Python
How to slice a block multiple array from a multiple array in Python
Linux script to convert Markdown files from JupyterLab format to Qiita format
Allow Python to select strings in input files from folders
Hit REST in Python to get data from New Relic
Convert images passed to Jason Statham-like in Python to ASCII art
Convert Excel file to text in Python for diff purposes
Call Matlab from Python to optimize
Convert .ipynb to .html (with BatchFile)
Python: Exclude tags from html data
[Python] How to do PCA in Python
View photos in Python and html
Create folders from '01' to '12' with python
Post from python to facebook timeline
How to use SQLite in Python
[Lambda] [Python] Post to Twitter from Lambda!
convert ggplot based graph to html
Workflow to convert formula (image) to python
In the python command python points to python3.8
Convert list to DataFrame with python