Implementation of a math parser with recursive descent parsing (Java)

It's been a long time since the internet was cut off at work. I tried to give the title a hard name that seems not to be well received in the current Qiita

Due to the performance required by the requirements, I created an analyzer that performs four arithmetic operations using the recursive descent parsing method. I'll leave it as it may be helpful to someone

Requirements

Since the formula is obtained as a character string in the batch application, it seems that I wanted to calculate it and return the result, but the number was huge. Originally, it seems that he intended to call JavaScript from Java and pass the formula as it is for calculation, but it seems that it did not finish in one day because the number of target formulas is tens of thousands and hundreds of thousands.

Moreover, it seems that the number will increase steadily in the future!

Of course, there is also the next job, so we have to make sure that the process is completed by the start time.

So I was talking about wanting me to do something about it. When I first heard about it, I thought that it was not possible to perform high-speed calculation processing with JavaScript by calling an external program.

You just have to create an analyzer that can perform high-speed four arithmetic operations. It seems that you also want to control the rounding error during division, so you need to think about that as well.

Well, I expected that if I implemented it all in Java, there would be no problem in terms of speed, so I decided to go with a reasonably powerful recursive descent parsing method instead of a simple implementation.

code

The following is implemented in Java 8

FormulaParser.java


import java.math.BigDecimal;
import java.math.RoundingMode;
import java.util.HashMap;
import java.util.Map;
import java.util.Optional;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class FormulaParser {
    private static final int HIGH_PRIORITY = 3;
    private static final int MID_PRIORITY = 2;
    private static final int LOW_PRIORITY = 1;
    private static final Pattern EMPTY_BRACKET = Pattern.compile("\\(\\)");
    private static final Pattern NOT_CLOSE_BRACKET = Pattern.compile("\\)\\(");
    private static final int NO_EXIST = -1;
    private static final String ZERO = "0";
    private static final String BRACKET_AND_ZERO = "(0";

    private String expression; //formula
    private FormulaParser left; //Left child node
    private FormulaParser right; //Right child node
    private int scale;

    /**
     *Constructor with scale
     ** Consider scale only in the case of division
     *
     * @param expression formula
     * @param scale Decimal point position when dividing
     */
    public FormulaParser(String expression, int scale) {
        this.expression = format(expression);
        this.scale = scale;
    }

    /**
     *Divide the formula into binary trees and calculate
     *
     * @throws FormulaParseException
     */
    public String execute() throws FormulaParseException {
        //Remove the outermost parentheses
        String exp = delMostOuterBrackets(this.expression);
        //Find the operator from the expression and get the position
        int opePos = getOpePos(exp);
        //If the expression does not contain an operator, it is considered as a term.
        if (opePos < 0) {
            left = null;
            right = null;
            return new BigDecimal(exp).toPlainString();
        }
        //Illegal treatment if operator position is at the beginning or end of an expression
        if (opePos == 0 || opePos == exp.length() - 1) {
            throw new FormulaParseException(exp);
        }
        //Create a node with the left side of the operator as the left subexpression
        left = new FormulaParser(exp.substring(0, opePos), scale);
        //Create a node with the right side of the operator as the right subexpression
        right = new FormulaParser(exp.substring(opePos + 1), scale);
        //Set the remaining operators in this note
        expression = exp.substring(opePos, opePos + 1);
        return calculate(left.execute(), OPERATOR.getEnum(expression), right.execute(), scale);
    }

    /**
     *Get the position of the operator
     *
     * @param exp expression
     * @return operator position(0 beginning)/If not-Returns 1
     */
    private static int getOpePos(String exp) {
        int opePos = NO_EXIST;
        int curPriority = Integer.MAX_VALUE;
        int nest = 0;
        for (int i = 0, len = exp.length(); i < len; i++) {
            int priority = 0;
            switch (OPERATOR.getEnum(String.valueOf(exp.charAt(i)))) {
                case PLUS:
                case MINUS:
                    priority = MID_PRIORITY;
                    break;
                case MULTIPLY:
                case DIVIDE:
                    priority = HIGH_PRIORITY;
                    break;
                case LEFT_BRACKET:
                    nest++;
                    continue;
                case RIGHT_BRACKET:
                    nest--;
                    continue;
                default:
                    continue;
            }
            if (nest == 0 && priority <= curPriority) {
                curPriority = priority;
                opePos = i;
            }
        }
        return opePos;
    }

    /**
     *Calculation process
     *
     * @param lExp Left term
     * @param ope operator
     * @param rExp Right term
     * @param scale scale
     */
    private String calculate(String lExp, OPERATOR ope, String rExp, int scale) throws FormulaParseException {
        BigDecimal left = new BigDecimal(lExp);
        BigDecimal right = new BigDecimal(rExp);
        switch (ope) {
            case PLUS:
                return left.add(right).toPlainString();
            case MINUS:
                return left.subtract(right).toPlainString();
            case MULTIPLY:
                return left.multiply(right).toPlainString();
            case DIVIDE:
                return left.divide(right, scale, RoundingMode.DOWN).toPlainString();
            default:
                throw new FormulaParseException(left + ope.val + rExp);
        }
    }

    /**
     *format
     *Half-width space is excluded
     *Since it is difficult to distinguish between positive and negative symbols and addition / subtraction operators, fill in 0 to make it easier.
     *
     * @param exp expression
     * @return formatted expression ex) -3 + 1 → 0-3+1
     */
    private static String format(String exp) {
        //Eliminate half-width spaces
        StringBuilder e = new StringBuilder(exp.replaceAll(" ", ""));
        int cur = 0;
        for (; ; ) {
            int plusIndex = e.indexOf(OPERATOR.PLUS.val, cur);
            int minusIndex = e.indexOf(OPERATOR.MINUS.val, cur);
            if (plusIndex == NO_EXIST && minusIndex == NO_EXIST) {
                break;
            }
            //Get index of which operator exists and precedes
            if (plusIndex < minusIndex) {
                cur = (plusIndex == NO_EXIST) ? minusIndex : plusIndex;
            } else {
                cur = (minusIndex == NO_EXIST) ? plusIndex : minusIndex;
            }
            if (cur == 0) {
                e.insert(cur, ZERO);
                cur = cur + ZERO.length() + 1;
                continue;
            }
            //Add 0 if the character immediately before the operator is not a number and is not a closing brace, multiplication, or division operator
            char preChar = e.charAt(cur - 1);
            if (!Character.isDigit(preChar)
                    && preChar != OPERATOR.RIGHT_BRACKET.cVal) {
                if (preChar == OPERATOR.MULTIPLY.cVal
                        || preChar == OPERATOR.DIVIDE.cVal
                        || preChar == OPERATOR.MINUS.cVal) {
                    int endDigits = 0;
                    for (int i = cur + 1, len = e.length(); i < len; i++) {
                        char c = e.charAt(i);
                        if (!Character.isDigit(c) && c != '.') {
                            break;
                        }
                        endDigits = i;
                    }
                    e.insert(cur, BRACKET_AND_ZERO).insert(endDigits + BRACKET_AND_ZERO.length() + 1, OPERATOR.RIGHT_BRACKET.cVal);
                    cur = endDigits + BRACKET_AND_ZERO.length();
                } else {
                    e.insert(cur, ZERO);
                    cur = cur + ZERO.length();
                }
            }
            cur++;
            if (cur > e.length()) break;
        }
        return e.toString();
    }

    /**
     *Parenthesis removal process
     *Remove the outermost parenthesis from the expression
     * (1+2)+(3+4)Returns an expression like
     *Throw an exception as an invalid expression if the parentheses are not closed
     *
     * @param exp expression
     * @return Expression ex with parentheses removed) (4*2) → 4*2,  ((3+4)) → 3+4, (1+2)+(3+4) → (1+2)+(3+4)
     * @throws FormulaParseException
     */
    private static String delMostOuterBrackets(String exp) throws FormulaParseException {
        if (matchFirst(exp, EMPTY_BRACKET).isPresent()) throw new FormulaParseException(exp);
        if (matchFirst(exp, NOT_CLOSE_BRACKET).isPresent()) throw new FormulaParseException(exp);
        if (exp.charAt(0) == OPERATOR.RIGHT_BRACKET.cVal) throw new FormulaParseException(exp);

        boolean hasMostOuterBrackets = false;
        int nest = 0;
        //Verify the first character
        if (exp.charAt(0) == OPERATOR.LEFT_BRACKET.cVal) {
            hasMostOuterBrackets = true;
            nest++;
        }
        //Verify the first and subsequent characters character by character
        for (int i = 1, len = exp.length(); i < len; i++) {
            if (exp.charAt(i) == OPERATOR.LEFT_BRACKET.cVal) {
                nest++;
            } else if (exp.charAt(i) == OPERATOR.RIGHT_BRACKET.cVal) {
                nest--;
                if (nest == 0 && i < len - 1) {
                    hasMostOuterBrackets = false;
                }
            }
        }
        //Illegal expression if parentheses are not closed
        if (nest != 0) throw new FormulaParseException(exp);
        //If there are no parentheses, return it as it is
        if (!hasMostOuterBrackets) return exp;
        //Remove the first and last parentheses
        String ret = exp.substring(1, exp.length() - 1);
        //If there are still parentheses, process again
        if (ret.charAt(0) == OPERATOR.LEFT_BRACKET.cVal
                && ret.charAt(ret.length() - 1) == OPERATOR.RIGHT_BRACKET.cVal) {
            return delMostOuterBrackets(ret);
        }
        return ret;
    }

    /**
     *Search
     *
     * @param s Search target
     * @param p regular expression
     * @return search results
     */
    private static Optional<String> matchFirst(String s, Pattern p) {
        Matcher m = p.matcher(s);
        return m.find() ? Optional.of(m.group(0)) : Optional.empty();
    }

    /**
     *operator
     */
    private enum OPERATOR {
        PLUS("+", '+'),
        MINUS("-", '-'),
        MULTIPLY("*", '*'),
        DIVIDE("/", '/'),
        LEFT_BRACKET("(", '('),
        RIGHT_BRACKET(")", ')'),
        NO_OPE(" ", ' ');

        public String val;
        public char cVal;
        private final static Map<String, OPERATOR> m = new HashMap<>();

        static {
            for (OPERATOR o : OPERATOR.values()) {
                m.put(o.val, o);
            }
        }

        private OPERATOR(String val, char cVal) {
            this.val = val;
            this.cVal = cVal;
        }

        public static OPERATOR getEnum(String val) {
            return m.getOrDefault(val, NO_OPE);
        }
    }
}

FormulaParseException.java


public class FormulaParseException extends Exception {
    public FormulaParseException(String msg) {
        super(msg);
    }
}

The exception implemented something like that Analysis by parser and calculation processing are not separated and are combined into one class. Since it is troublesome to distinguish between positive and negative symbols and addition / subtraction operators, it is filled with 0. It is also a point to add parentheses as needed

Since analysis also costs a certain amount, the original application has a certain amount of pre-checking. Also, I am making it possible to process Math functions and variables, but I omitted it because it is difficult to remember and write code. The matchFirst () method is a remnant of that I implemented other useful utilities and did various things

All you have to do is replace the variables appropriately As for the Math function, just to give you a hint, the Math function is treated as a term in the code above. So, in the code below, you should determine whether it is a Math function and call the method that processes it if it is a Math function. return new BigDecimal(exp).toPlainString();

The method that processes the Math function should be processed by checking the characters steadily like delMostOuterBrackets () and format (). In other words, you can apply the code written in the article, so if necessary, try making it yourself.

At the end

For the time being, I implemented it and tried it, and it met the requirements, so it ends there. It is suspicious if the number of cases goes from tens of millions to 100 millions, but it seems that it will not go that far, so it seems okay

I wrote a lot of tests, but I will omit it because it is difficult to remember and port Well, I don't even need JUnit because I just check the expected result of the expression.

I'm learning by myself, not by majoring in CS, so it was quite helpful to think carefully about this kind of design and implementation in practice.

Also, I hope that there will be projects that can not be matched without knowledge of algorithms and data structures, and the article is over.

Noshi

Recommended Posts

Implementation of a math parser with recursive descent parsing (Java)
[Java twig] Create a parser combinator for recursive descent parsing 2
[Java twig] Create a parser combinator for recursive descent parsing (also memoize)
Do you need a memory-aware implementation of Java?
Story of making a task management application with swing, java
[Java] Implementation of Faistel Network
Implementation of XLPagerTabStrip with TabBarController
[Java] Simplify the implementation of data history management with Reladomo
Summary of Java Math class
Implementation of gzip in java
HTML parsing with JAVA (scraping)
Implementation of tri-tree in Java
Let's create a TODO application in Java 4 Implementation of posting function
Let's create a TODO application in Java 6 Implementation of search function
Let's create a TODO application in Java 8 Implementation of editing function
[Details] Implementation of consumer applications with Kinesis Client Library for Java
A story about hitting the League Of Legends API with JAVA
[Illustration] Finding the sum of coins with a recursive function [Ruby]
Think of a Java update strategy
Build a Java project with Gradle
A brief description of JAVA dependencies
Implementation of like function in Java
Get a list of S3 files with ListObjectsV2Request (AWS SDK for Java)
A record of setting up a Java development environment with Visual Studio Code
The story of making a game launcher with automatic loading function [Java]
Let's express the result of analyzing Java bytecode with a class diagram
Implementation of clone method for Java Record
Implementation of DBlayer in Java (RDB, MySQL)
Name a group of regular expressions (Java)
[Swift 5] Implementation of membership registration with Firebase
Extract a part of a string with Ruby
Split a string with ". (Dot)" in Java
Java automated test implementation with JUnit 5 + Gradle
A story confirming the implementation of the SendGrid Java library when mail delivery fails
Validate the identity token of a user authenticated with AWS Cognito in Java