A lot of notes on how to use regular expressions in Java.

Methods that use regular expressions in the String class

There are several methods in String that accept regular expressions.

matches(String)

package sample.regexp;

public class Main {

    public static void main(String[] args) {
        String text = "abc123";

        System.out.println(text.matches("[a-z0-9]+"));
        System.out.println(text.matches("[a-z]+"));
    }
}

`Execution result`


true
false

--Verify that the string ** exactly matches the specified regular expression ** --If only a part matches, it will be false

replaceAll(String, String)

package sample.regexp;

public class Main {

    public static void main(String[] args) {
        String text = "abc123";

        System.out.println(text.replaceAll("[a-z]", "*"));
    }
}

`Execution result`


***123

--Pass a regular expression as the first argument and replace all matching parts with the string of the second argument

Use the group that matches the replacement string

package sample.regexp;

public class Main {

    public static void main(String[] args) {
        String text = "<<abc123>>";

        System.out.println(text.replaceAll("([a-z]+)([0-9]+)", "$0, $1, $2"));
    }
}

`Execution result`


<<abc123, abc, 123>>

--By including $ n in the replacement string, the matched group can be reused after replacement. -- n starts with 0 --0 refers to the entire matched string --Since it is the part that matches ([a-z] +) ([0-9] +), ʻabc123is the target. --From1 onward, you can refer to the groups enclosed in () in order. -- $ 1 matches ([a-z] +) , but ʻabc is -- $ 2 is for 123 that matches ([0-9] +) --If you specify n more than the number of matching groups, ʻIndexOutOfBoundsExceptionis thrown. --If you just want to replace it with the string$, escape it with a backslash (`) - text.replaceAll("[a-z]+", "\\$") --If not escaped, ʻIllegalArgumentException` is thrown --Groups can be referred to by name in addition to indexes --See [here](#% E3% 82% B0% E3% 83% AB% E3% 83% BC% E3% 83% 97) for details.

replaceFirst(String, String)

package sample.regexp;

public class Main {

    public static void main(String[] args) {
        String text = "abc123";

        System.out.println(text.replaceFirst("[a-z]", "*"));
    }
}

`Execution result`


*bc123

--Replace only the first matching part of the substring that matches the regular expression --The substring can be referenced with $ n, which is the same asreplaceAll ().

split(String, int)

package sample.regexp;

import java.util.Arrays;

public class Main {

    public static void main(String[] args) {
        String text = "a1b2";
        
        for (int i=-1; i<5; i++) {
            String[] elements = text.split("[0-9]", i);
            System.out.println("limit=" + i + ",\telements=" + Arrays.toString(elements));
        }
    }
}

`Execution result`


limit=-1,	elements=[a, b, ]
limit=0,	elements=[a, b]
limit=1,	elements=[a1b2]
limit=2,	elements=[a, b2]
limit=3,	elements=[a, b, ]
limit=4,	elements=[a, b, ]

--Split the string at the location that matches the regular expression specified in the first argument --The second argument, limit, determines the upper limit of the size of the return array. --If you specify a value greater than or equal to 1 for limit, the matching substring will be split up to limit --1. --If limit == 1, then limit -1 => 0, so no splitting is done (as a result, the size of the array is 1). --In the case of limit == 2, it becomes limit --1 => 1, so the split is performed at the 1 part of ʻa1b2 that first matches the regular expression [0-9] . We then end the split (resulting in an array size of 2) --If a value less than or equal to 0 is specified for limit, it will be treated as unlimited and division will be executed up to the end of the character string. --However, the behavior when there is no character string left at the end of the split result (it becomes blank) differs between 0and negative numbers. --For negative numbers, the last whitespace is also left as an array element --If0`, the last whitespace is discarded

If the beginning becomes blank as a result of division, the blank is set as an element of the array as it is.

package sample.regexp;

import java.util.Arrays;

public class Main {

    public static void main(String[] args) {
        String text = "0a1b2";
        
        String[] elements = text.split("[0-9]", 0);
        System.out.println(Arrays.toString(elements));
    }
}

`Execution result`


[, a, b]

split(String) This is the same behavior as setting the second argument of split (String, int) to 0.

Pattern class

Differences from String methods

With some exceptions [^ 1], methods that use regular expressions in the String class delegate processing to the Pattern class behind the scenes. For example, if you check the implementation of the replaceAll () method, it looks like this:

`String.replaceAll()`


    public String replaceAll(String regex, String replacement) {
        return Pattern.compile(regex).matcher(this).replaceAll(replacement);
    }

This Pattern class (and Matcher) is in charge of processing regular expressions in Java.

The Pattern class interprets the string passed bycompile ()as a regular expression. If the regular expression you use is fixed, it's more efficient to run this compile () only the first time and then reuse the Pattern instance. (The Pattern class is immutable, so it can be safely reused even in multithreading.)

However, when using a method that uses a regular expression of the String class, thiscompile ()is executed every time. Therefore, if you use the String method when executing a fixed regular expression over and over again, the processing speed will be slower than reusing the Pattern instance.

`Example of reusing Pattern`


public class Hoge {
    //Reuse the compiled Pattern instance
    private static final Pattern HOGE_PATTERN = Pattern.compile("[0-9]+");

    public boolean test(String text) {
        return HOGE_PATTERN.matcher(text).matches(); //The movement is text.maches("[0-9]+")Same as
    }
}

Basic usage

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        Pattern pattern = Pattern.compile("[0-9]+");
        
        Matcher abc = pattern.matcher("123abc");
        System.out.println(abc.matches());

        Matcher _123 = pattern.matcher("123");
        System.out.println(_123.matches());
    }
}

`Execution result`


false
true

--First, compile the regular expression with Pattern.compile (String) and get the Pattern instance. --Next, pass the character string (** input sequence **) you want to verify with the Pattern.matcher (String) method and get the Matcher instance. --Use the acquired Matcher instance to verify whether it matches or not. --Matcher.matches () verifies that the entire input sequence matches the regular expression and returns a result with boolean

Split

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.Arrays;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        Pattern pattern = Pattern.compile("[a-z]+");

        String[] elements = pattern.split("123abc456def789ghi");
        System.out.println(Arrays.toString(elements));

        elements = pattern.split("123abc456def789ghi", -1);
        System.out.println(Arrays.toString(elements));
    }
}

`Execution result`


[123, 456, 789]
[123, 456, 789, ]

--Split the string at the part of the specified string that matches the regular expression with Pattern.split (String) --The movement is the same as String.split (String), String.split (String, int)

Matcher --Pattern is a class that interprets regular expressions, and Matcher does the following: --Whether the input sequence matches the regular expression --Extraction of matched parts --Replacement of matched parts --Matcher is used in the following steps.

Perform a match operation
Query the result of the match operation
Repeat steps 1 and 2 if necessary --The result of the match operation can be referenced by the following method. --start () Start index on matched input sequence --ʻEnd () End index on matched input sequence + 1 --group () Matched substring --If you execute these methods without performing a match operation, ʻIllegalStateException will be thrown. --Note that Matcher is not thread-safe ** **

Match operation

There are three match operations in Matcher.

matches() --Verify that the entire input sequence matches the regular expression
lookingAt() --Verify that the regular expression matches from the beginning of the input sequence
find() --Verify in order whether there is a part that matches the regular expression in the input sequence

matches()

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        test("abc");
        test("abc123");
    }

    private static void test(String text) {
        Pattern pattern = Pattern.compile("[a-z]+");
        Matcher matcher = pattern.matcher(text);

        System.out.println("[text=" + text + "]");
        if (matcher.matches()) {
            System.out.println("matches = true");
            System.out.println("start = " + matcher.start());
            System.out.println("end = " + matcher.end());
            System.out.println("group = " + matcher.group());
        } else {
            System.out.println("matches = false");
        }
    }
}

`Execution result`


[text=abc]
matches = true
start = 0
end = 3
group = abc

[text=abc123]
matches = false

--matches () verifies that the entire input sequence matches the regular expression --Returns true if there is a match

lookingAt()

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        test("abc");
        test("123abc");
        test("ab12");
    }

    private static void test(String text) {
        Pattern pattern = Pattern.compile("[a-z]+");
        Matcher matcher = pattern.matcher(text);

        System.out.println("[text=" + text + "]");
        if (matcher.lookingAt()) {
            System.out.println("lookingAt = true");
            System.out.println("start = " + matcher.start());
            System.out.println("end = " + matcher.end());
            System.out.println("group = " + matcher.group());
        } else {
            System.out.println("lookingAt = false");
        }
    }
}

`Execution result`


[text=abc]
lookingAt = true
start = 0
end = 3
group = abc

[text=123abc]
lookingAt = false

[text=ab12]
lookingAt = true
start = 0
end = 2
group = ab

--lookingAt () verifies that the regular expression matches from the beginning of the input sequence --If the verification results from the beginning match, true is returned (the whole does not have to match).

find()

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        test("abc");
        test("123abc456def789");
    }

    private static void test(String text) {
        Pattern pattern = Pattern.compile("[a-z]+");
        Matcher matcher = pattern.matcher(text);

        System.out.println("[text=" + text + "]");
        while (matcher.find()) {
            System.out.println("start = " + matcher.start());
            System.out.println("end = " + matcher.end());
            System.out.println("group = " + matcher.group());
        }
    }
}

`Execution result`


[text=abc]
start = 0
end = 3
group = abc

[text=123abc456def789]
start = 3
end = 6
group = abc

start = 9
end = 12
group = def

--The find () method scans the beginning of the input sequence for a matching regular expression. --Returns true if there is a matching substring --If you execute find () again, it will scan for a substring that matches again from the previously matched part. --Matched substrings can be extracted by repeatedly executing --start (), ʻend () , group () `returns the result of the last match

Replacement

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        Pattern pattern = Pattern.compile("[a-z]+");
        Matcher matcher = pattern.matcher("abc123def");

        System.out.println("replaceAll = " + matcher.replaceAll("*"));
        System.out.println("replaceFirst = " + matcher.replaceFirst("*"));
    }
}

`Execution result`


replaceAll = *123*
replaceFirst = *123def

--Replace all matched substrings with Matcher.replaceAll (String) --Matcher.replaceFirst (String) replaces only the first matched substring

group

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        Pattern pattern = Pattern.compile("([a-z]+)([0-9]+)");
        Matcher matcher = pattern.matcher("abc123de45fg");

        int groupCount = matcher.groupCount();
        System.out.println("groupCount=" + groupCount);
        
        while (matcher.find()) {
            System.out.println("==========");
            String group = matcher.group();
            System.out.println("group=" + group);
            
            for (int i=0; i<=groupCount; i++) {
                String g = matcher.group(i);
                System.out.println("group(" + i + ")=" + g);
            }
        }
    }
}

`Execution result`


groupCount=2
==========
group=abc123
group(0)=abc123
group(1)=abc
group(2)=123
==========
group=de45
group(0)=de45
group(1)=de
group(2)=45

--The following methods are provided to refer to the group defined by the regular expression (the part enclosed by ()). --groupCount () Get the number of groups defined by the regular expression --group () Get the entire string matched by the most recent match operation --group (int) Get the group of the specified index among the groups matched in the latest match operation. --The number 0 is the entire matched string, so it returns the same result asgroup () --From 1 to the matching substring

Give the group a name

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        Pattern pattern = Pattern.compile("(?<alphabets>[a-z]+)(?<numbers>[0-9]+)");
        Matcher matcher = pattern.matcher("abc123de45fg");
        
        while (matcher.find()) {
            System.out.println("==========");
            System.out.println("group(alphabets)=" + matcher.group("alphabets"));
            System.out.println("group(numbers)=" + matcher.group("numbers"));
        }
    }
}

`Execution result`


==========
group(alphabets)=abc
group(numbers)=123
==========
group(alphabets)=de
group(numbers)=45

--You can define a name for a group by defining the group as (? <Group name> pattern). --You can get the substring that matches the group by specifying the name defined by the group (String) method.

To refer to the group name in the replacement string, do the following:

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        Pattern pattern = Pattern.compile("(?<alphabets>[a-z]+)(?<numbers>[0-9]+)");
        Matcher matcher = pattern.matcher("abc123def456");

        String replaced = matcher.replaceAll("${numbers}${alphabets}");
        System.out.println(replaced);
    }
}

`Execution result`

123abc456def

--You can refer to a group with $ {group name}

flag

--When creating a Pattern instance, you can adjust the way regular expressions are interpreted with the ** flag **.

`Compile with flags`


Pattern pattern = Pattern.compile("[a-z]", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);

--Flag is specified by the second argument of compile (String, int) --You can specify constants declared as static in the Pattern class. --Since it is a bit mask, when specifying multiple flags, specify them by concatenating them with |.

CASE_INSENSITIVE (case insensitive)

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        Pattern pattern = Pattern.compile("[a-z]+", Pattern.CASE_INSENSITIVE);
        Matcher matcher = pattern.matcher("ABC");
        System.out.println(matcher.matches());
    }
}

`Execution result`


true

--If you specify CASE_INSENSITIVE, the match is case insensitive. --Only US-ASCII characters are indistinguishable

UNICODE_CASE (Unicode case insensitive)

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        Pattern pattern = Pattern.compile("[a-zＡ-Ｚ]+", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
        Matcher matcher = pattern.matcher("ABCａｂｃ");
        System.out.println(matcher.matches());
    }
}

`Execution result`


true

―― Combining ʻUNICODE_CASEandCASE_INSENSITIVE` provides case-insensitive matching in Unicode.

LITERAL (do not use regular expression metacharacters or escape characters)

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        Pattern pattern = Pattern.compile("[a-z]+", Pattern.LITERAL);
        
        Matcher matcher = pattern.matcher("abc");
        System.out.println(matcher.matches());
        
        matcher = pattern.matcher("[a-z]+");
        System.out.println(matcher.matches());
    }
}

`Execution result`


false
true

--If LITERAL is specified, the string passed in the first argument ofcompile (String, int)will be processed as a simple string. --Regular expression meaningful characters such as [] and + are simply interpreted as the characters themselves.

MULTILINE (processes multi-line strings)

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.function.Supplier;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        test("[default]", () -> Pattern.compile("^[a-z]+$"));
        test("[MULTILINE]", () -> Pattern.compile("^[a-z]+$", Pattern.MULTILINE));
    }
    
    private static void test(String label, Supplier<Pattern> patternSupplier) {
        System.out.println(label);
        Pattern pattern = patternSupplier.get();

        String text = "abc\n"
                    + "def\n";

        Matcher matcher = pattern.matcher(text);

        while (matcher.find()) {
            String group = matcher.group();
            System.out.println(group);
        }
    }
}

`Execution result`


[default]
[MULTILINE]
abc
def

--When MULTILINE is specified, the handling of^and$representing the beginning and end of lines changes --If nothing is specified, ^ and $ match purely at the beginning and end of the string. --If MULTILINE is specified, each line break will be treated as a string, so^and$will match the beginning and end of each line.

COMMENTS (Allows you to write comments)

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        String regexp = "#This line is ignored as a comment\n"
                      + "  [a-z]+  ";
        Pattern pattern = Pattern.compile(regexp, Pattern.COMMENTS);

        Matcher matcher = pattern.matcher("abc");

        System.out.println(matcher.matches());
    }
}

`Execution result`


true

--If COMMENTS is specified, the following strings will be treated as comments and ignored. --From # to the end of the line --Blank space

DOTALL (make sure it matches the end of the line with `.`)

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.function.Supplier;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        test("[default1]", () -> Pattern.compile(".+"));
        test("[default2]", () -> Pattern.compile(".+$"));
        test("[DOTALL]", () -> Pattern.compile(".+", Pattern.DOTALL));
    }

    private static void test(String label, Supplier<Pattern> patternSupplier) {
        System.out.println(label);
        Pattern pattern = patternSupplier.get();

        String text = "abc\n"
                    + "def\n";

        Matcher matcher = pattern.matcher(text);

        if (matcher.find()) {
            String group = matcher.group();
            System.out.println(group);
        }
    }
}

`Execution result`


[default1]
abc
[default2]
def
[DOTALL]
abc
def

--If you specify DOTALL, . will also match the end of the line. --By default, . does not match end of line

reference

[^ 1]: For example, the split (String regexp) method splits without using Pattern when regexp is a plain string that does not use regular expression metacharacters. Is going

Notes on how to use regular expressions in Java

Methods that use regular expressions in the String class

Execution result

Execution result

Use the group that matches the replacement string

Execution result

Execution result

Execution result

Execution result

Pattern class

Differences from String methods

String.replaceAll()

Example of reusing Pattern

Basic usage

Execution result

Split

Execution result

Match operation

Execution result

Execution result

Execution result

Replacement

Execution result

group

Execution result

Give the group a name

Execution result

Execution result

flag

Compile with flags

CASE_INSENSITIVE (case insensitive)

Execution result

UNICODE_CASE (Unicode case insensitive)

Execution result

LITERAL (do not use regular expression metacharacters or escape characters)

Execution result

MULTILINE (processes multi-line strings)

Execution result

COMMENTS (Allows you to write comments)

Execution result

DOTALL (make sure it matches the end of the line with .)

Execution result

reference

`Execution result`

`Execution result`

`Execution result`

`Execution result`

`Execution result`

`Execution result`

`String.replaceAll()`

`Example of reusing Pattern`

`Execution result`

`Execution result`

`Execution result`

`Execution result`

`Execution result`

`Execution result`

`Execution result`

`Execution result`

`Execution result`

`Compile with flags`

`Execution result`

`Execution result`

`Execution result`

`Execution result`

`Execution result`

DOTALL (make sure it matches the end of the line with `.`)

`Execution result`