I can't handle left recursion, but PEG parsers are quite popular these days. For Python, there are https://github.com/erikrose/parsimonious and https://github.com/KuramitsuLab/pegpy.
I wanted to parse the formula used in the theorem proving support system called Coq https://coq.inria.fr/. So I took a quick look at some Python lexical / parsing tools.
The grammar of Coq expressions is quite complicated (according to the grammar definition, it is a left-recursive grammar [^ 1] and requires lookahead). Therefore, the following points were used when selecting. If you want to deal with other grammars, of course, it will be a different criterion.
--Do you want to handle left recursion grammar? -Is it a complicated grammar that requires look-ahead? --Whether to write the grammar in DSL or define it as a parser combinator
Below are the survey results.
[^ 1]: ʻexpr :: = expr A grammar with rules like "+" term`. It is theoretically possible to rewrite it into a grammar that does not have left recursion, but I don't want to do it with a huge grammar like Coq, and I don't want to make human mistakes.
pyparsing (2.2.0) 2017/03
Parsing library. A parser-based approach that creates parsing rules by combining classes such as ʻOneOf, ʻOptional
, and Group
. It is very easy to define "nested comments", which is troublesome in grammar design of programming languages. It is very easy to specify the language class, but the left recursion grammar cannot be handled as it is, so some ingenuity is required.
How to use Pyparsing: http://masato.github.io/2014/07/01/python27-etl-pyparsing-syntactic-analysis/
PLY (Python Lex-Yacc) (3.11) 2018/02 Python version of Lex / Yacc, a well-known C lexical / parsing tool. This means that similar grammar definition syntax can be used. LALR (1) parsing is basically used for the analysis. Headquarters: http://www.dabeaz.com/ply/ Reference: http://blog.livedoor.jp/shf0811/archives/7346881.html
parse (1.6.6; 2014/11)
A pattern matching library that works the opposite of String.format ()
.
Reference: http://coreblog.org/ats/python-parse/
Summary,
--Parse if simple pattern matching is enough --PLY if you want to code left recursive grammar / write rules as an independent document and don't need complicated look-ahead --Pyparsing if you want to define rules by coding
Seems to be good.
In addition, PLY only looks ahead for one token. If you want to write a complicated grammar, it is almost an option of ANTLR (Java tool) that uses LL (*) parsing. For example, ANTLR can easily parse the grammar of Coq, which is a support system for theorem proving. ANTLR v4 has the ability to generate a parser for Python 2/3. Also, ANTLR's Python wrapper seems to be on PyPI.
Recommended Posts