I wrote a C parser (like) using PEG in Ruby

It is an article that I wrote something like a C parser in Ruby. Even though I wrote a C parser, it is not a strict and highly complete C parser like pycparser implemented in Python, but it is a miscellaneous implementation that took about 3 days to implement.

Repository: github.com/hsssnow23/Captain

sample input:

typedef struct {
    unsigned int id;
    float x;
    float y;
} Actor;

output:

#<CTypedef:0x000000037809a8
 @from=
  #<CStruct:0x0000000376a068
   @body=
    [#<CVariable:0x000000034f0350
      @name="id",
      @type=
       #<CType:0x000000034f3780
        @const=false,
        @name="int",
        @pointer=false,
        @prefix="unsigned">,
      @value=nil>,
     #<CVariable:0x000000035b6ca8
      @name="x",
      @type=
       #<CType:0x000000035ad950
        @const=false,
        @name="float",
        @pointer=false,
        @prefix=nil>,
      @value=nil>,
     #<CVariable:0x000000036a0df8
      @name="y",
      @type=
       #<CType:0x000000036a3a30
        @const=false,
        @name="float",
        @pointer=false,
        @prefix=nil>,
      @value=nil>],
   @name=nil>,
 @to="Actor">

Originally, it was a parser created for a tool that automatically generates code by adding additional information to C with annotations, but it is really slow. I think the main reason is that the PEG parser used for the implementation is not Packrat Parsing in my implementation. So, in this article, I would like to write about how it was when I actually used the PEG parser.

Roughly summarized first

Benefits of PEG

--In a language like Ruby that can overload operators, it is easy to understand because it can be written like DSL. (Parser generators such as lex yacc have a lot of tricks, so I think it's a little hard to get along with.) ――It is easy to implement the PEG parser itself if it is a simple one that ignores speed. (Finally, the implementation of the PEG parser itself is 269 lines.) ――Since lexical analysis and parsing can be performed at the same time, it saves time and effort. Therefore, it can be used as easily as a regular expression. --Unlike regular expressions, you can parse parentheses.

Disadvantages of PEG

--Simple implementation cannot be executed with O (n). --Left recursion is not possible.

Impressions

easy. It's overwhelmingly easy. I think the big advantage compared to other parsers is that you can start writing as soon as you think what is easy. You can skip lexical analysis and write a parser to create a syntax tree, so it's a simple parser, but I think it's best suited when you want to do it richer than regular expressions. However, I thought it would be a little difficult to write a parser in PEG, although it is already a specification like the C parser. Many existing programming languages are made with parser generators such as lex and yacc, and it is difficult to ensure consistency with them, and PEG is still young and it is very clear how far it can be parsed. It seems that it is not. (Honestly, I'm not confident that I can parse it if it's a C language source that pokes in the corner)

However, I felt that it would be easier to write a parser whose parser itself changes depending on the content of the parsing. (I think there are few situations where it is needed)

Summary

In my final conclusion, PEG is the most recommended parser. However, while I'm sure a small format parser that you can specify yourself is a good choice, I thought it might be subtle to use elsewhere.

Reference article: http://kmizu.hatenablog.com/entry/20100203/1265183754