Last time, Make a language! (JavaCC environment construction) has prepared the JavaCC development environment. This time, let's create a simple calculator and learn how to use JavaCC.
When you create a JJT file, JavaCC will generate a Java Parser for you. A JJT file is a file that defines lexical analysis and parsing.
This time, we will focus on lexical analysis, parsing, and semantic analysis (execution of syntactic trees).
First of all, before using JavaCC, we will prepare for the language. Think of ways to describe what the language is and analyze it properly.
BNF BNF is a language for defining the grammar of a language. It is described in a format in which the left side is defined by the right side. There are various ways to describe BNF, but this time we will describe BNF in a regular expression style according to JavaCC. The simplicity of the description seems to be about the extended BNF (extendes BNF).
The main symbols used are as follows.
symbol | meaning | Example |
---|---|---|
::= |
Define the left side with the right side | |
() |
() Grouping inside. |
("+" | "-") |
<xxx> |
Non-terminal symbol. Abstract expressions and sentences are non-terminal. | |
"xxx" |
Terminal symbol. When it comes to specific letters and numbers, it ends. | "1" |
* |
Repeat immediately before 0 times or more | ("1" | "2")* |
+ |
Repeat immediately before once or more | ("1" | "2")+ |
| | Represents or | "1" | "2" |
From now on, we will consider the simple formula below.
BNF with a simple formula
<formula> ::= <加算formula>
<Addition formula> ::= <Multiplication formula> ( <Addition operator> <Multiplication formula> )*
<Multiplication formula> ::= <Monomial> ( <Multiplication operator> <Monomial> )*
<Monomial> ::= <Open brackets> <formula> <Closing brackets> | <Decimal representation>
<Addition operator> ::= "+" | "-"
<Multiplication operator> ::= "*" | "/" | "%"
<Open brackets> ::= "("
<Closing brackets> ::= ")"
<Decimal representation> ::= (<Numbers>)+ ("." (<Numbers>)+)?
<Numbers> ::= "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" | "0"
Let's check this BNF.
For example, the formula 10 * 20 * (30 + 40)
is
<formula>
→ <Addition formula>
→ <Multiplication formula> ( <Addition operator> <Multiplication formula> )*
→ <Multiplication formula>
→ <Monomial> ( <Multiplication operator> <Monomial> )*
→ <Monomial> <Multiplication operator> <Monomial> <Multiplication operator> <Monomial>
→ <Decimal representation> <Multiplication operator> <Monomial> <Multiplication operator> <Monomial>
→ (<Numbers>)+ ("." (<Numbers>)+)? <Multiplication operator> <Monomial> <Multiplication operator> <Monomial>
→ <Numbers> <Numbers> <Multiplication operator> <Monomial> <Multiplication operator> <Monomial>
→ 10 <Multiplication operator> <Monomial> <Multiplication operator> <Monomial>
→ 10 * <Monomial> <Multiplication operator> <Monomial>
→ ...
→ 10 * 20 * <Monomial>
→ 10 * 20 * <Open brackets> <formula> <Closing brackets>
→ 10 * 20 * ( <formula> <Closing brackets>
→ 10 * 20 * ( <Addition formula> <Closing brackets>
→ 10 * 20 * ( <Multiplication formula> ( <Addition operator> <Multiplication formula> )* <Closing brackets>
→ 10 * 20 * ( <Multiplication formula> <Addition operator> <Multiplication formula> <Closing brackets>
→ ...
→ 10 * 20 * ( <Monomial> <Addition operator> <Multiplication formula> <Closing brackets>
→ ...
→ 10 * 20 * ( 30 <Addition operator> <Multiplication formula> <Closing brackets>
→ ...
→ 10 * 20 * ( 30 + 40 )
You can see that it can be derived as follows.
Lexical analysis is the process of separating a list of characters with appropriate delimiters. BNF will analyze the part closer to the terminal symbol.
BNF lexical analysis part of a simple formula
<Addition operator> ::= "+" | "-"
<Multiplication operator> ::= "*" | "/" | "%"
<Open brackets> ::= "("
<Closing brackets> ::= ")"
<Decimal representation> ::= (<Numbers>)+ ("." (<Numbers>)+)?
<Numbers> ::= "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" | "0"
At this time, numbers, decimal expressions, and addition operators are considered to be lexical types. As an image, it is the work of appropriately separating a series of character strings.
Parsing will handle the most meaningful parts of BNF (the parts that make sense to create a syntax tree).
BNF parsing part of a simple formula
<formula> ::= <加算formula>
<Addition formula> ::= <Multiplication formula> ( <Addition operator> <Multiplication formula> )*
<Multiplication formula> ::= <Monomial> ( <Multiplication operator> <Monomial> )*
<Monomial> ::= <Open brackets> <formula> <Closing brackets> | <Decimal representation>
It is a task to assemble a tree with the lexicals that are appropriately separated by lexical analysis as nodes. There are various ways to assemble this tree, such as upward parsing and downward parsing. This time JavaCC uses recursive descent parsing.
In fact, it is difficult to draw a clear line between parsing and lexical analysis. For example, when you can add +
or -
to the beginning of a monomial,
It depends on how you make it, whether you use + 10
to make one word or treat it as two words, -
and 10
.
(However, in this case, it is better to treat +
and -
as different lexical terms and let them judge at the stage of parsing because it depends on the context. It may be easy to handle.)
Let's put this BNF into a JavaCC jjt file. Make a language! (Making a simple calculator ②).
Make a language! I tried using bison and flex Make a language! (JavaCC environment construction) Make a language! (Making a simple calculator ②)
Recommended Posts