[JAVA] Let's implement Lexer (1)

Preface

version environment
jdk-10 IntelliJ IDEA

Maybe it works in the above environment.

Previous article: [Implement EAM] (https://qiita.com/mirror11akii/items/c72c2d2a108a9b75fccb) Create a Lexer that lexically analyzes the char [] that comes back as a result of EAM. Yesterday, there was a lot of flaws in the code, so I felt a lot of lack of ability and reflected on it. We apologize for the inconvenience. Please do not hesitate to contact us if you have any problems.

What is Lexer

Lexer is a lexical analyzer. It seems that it is also called Parser, but Parser is called Parser Combinator and it gets confusing, so I will use this. Let's take a look at the implementation.

Implementation

Lexer.java


import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class Lexer {
    static String s = "";
    static List<String> l= new ArrayList<>();

    //private static List<Character> brackets = Arrays.asList('(',')','{','}','[',']');
    private static List<Character> operators = Arrays.asList('=','+','-','*','/');

    public static Type getType(final Character c){
        if(Character.isLetter(c)) return Type.alphabet;
        else if(Character.isDigit(c)) return Type.digit;
        else if(Character.isWhitespace(c)) return Type.space;
        else if(operators.contains(c)) return Type.operator;
        //else if(brackets.contains(c)) return Type.brackets;
        return Type.other;
    }
    public static void analyze(char c){
        switch (getType(c)) {
            case other:
                System.out.println("This character cannot be accepted");
                break;
            case space:
                if(s != ""){
                    l.add(s);
                    s = "";
                }
                break;
            //case bracket:
            case operator:
                if(s != ""){
                    l.add(s);
                    l.add(""+c);
                    s = "";
                    break;
                }
            case alphabet:
            case digit:
                s += c;
            default: break;
        }
    }
    public static void addString(){
        l.add(s);
    }
    public static List<String> getList(){
        return l;
    }
}

Type.java


public enum Type{
    alphabet,
    digit,
    space,
    operator,
    //bracket,
    other
}

Implementation description

No, I think it's a dirty code myself. I will fix it from now on. If you have a good idea, please let me know. The word bracket can be seen in the comments, but it means "parentheses". It's annoying, so I put it off for the time being.

public static Type getType(final Character c) Find out what the characters you read in this part are. public static void addString() Here we are adding the last token to the list

Run

EAM.java


import java.util.*;
import java.io.*;

public class EAM{
    private final FileReader fr;

    private EAM(final String s) throws IOException{
        fr = new FileReader(s);
    }
    public static char[] use(final String s,final Use<EAM,char[],IOException> u) throws IOException{
        final EAM eam = new EAM(s);
        char[] c;
        try{
            c = u.apply(eam);
        }finally{
            eam.close();
        }
        return c;
    }
    private void close() throws IOException{
        System.out.println("close()");
        fr.close();
    }
    public char[] read() throws IOException {
        char[] c = new char[100];
        int length = fr.read(c);
        return Arrays.copyOfRange(c, 0, length);
    }
}

Use.java


@FunctionalInterface
public interface Use<T,R,X extends Throwable>{
	R apply(T t) throws X;
}

Main.java


import java.io.IOException;

public class Main{
    public static void main(String[] args) throws IOException{
        char[] array = EAM.use("lib\\test.txt", eam -> eam.read());
        System.out.println("\n--EAM");
        for(char c : array){
            System.out.print(c);
            Lexer.analyze(c);
        }
        FLexer.addString();
        System.out.println( "\n--Lexer\n" + Lexer.getList());
    }
}

I'm reading using the previous EAM. The content of the method is different here and there. I wanted to be able to return char [].

result

close()

--EAM
a = 2 + 7 * 3
log a
--Lexer
[a, =, 2, +, 7, *, 3, log, a]

If it looks like this, it's a success. If you have any questions, please ask.

Recommended Posts

Let's implement Lexer (1)
Let's implement Lexer (2)
Let's implement EAM
Implement tail