Notes on how to use regular expressions in Java

A lot of notes on how to use regular expressions in Java.

Methods that use regular expressions in the String class

There are several methods in String that accept regular expressions.

matches(String)

package sample.regexp;

public class Main {

    public static void main(String[] args) {
        String text = "abc123";

        System.out.println(text.matches("[a-z0-9]+"));
        System.out.println(text.matches("[a-z]+"));
    }
}

Execution result


true
false

--Verify that the string ** exactly matches the specified regular expression ** --If only a part matches, it will be false

replaceAll(String, String)

package sample.regexp;

public class Main {

    public static void main(String[] args) {
        String text = "abc123";

        System.out.println(text.replaceAll("[a-z]", "*"));
    }
}

Execution result


***123

--Pass a regular expression as the first argument and replace all matching parts with the string of the second argument

Use the group that matches the replacement string

package sample.regexp;

public class Main {

    public static void main(String[] args) {
        String text = "<<abc123>>";

        System.out.println(text.replaceAll("([a-z]+)([0-9]+)", "$0, $1, $2"));
    }
}

Execution result


<<abc123, abc, 123>>

--By including $ n in the replacement string, the matched group can be reused after replacement. -- n starts with 0 --0 refers to the entire matched string --Since it is the part that matches ([a-z] +) ([0-9] +), ʻabc123is the target. --From1 onward, you can refer to the groups enclosed in () in order. -- $ 1 matches ([a-z] +) , but ʻabc is -- $ 2 is for 123 that matches ([0-9] +) --If you specify n more than the number of matching groups, ʻIndexOutOfBoundsExceptionis thrown. --If you just want to replace it with the string$, escape it with a backslash (`) - text.replaceAll("[a-z]+", "\\$") --If not escaped, ʻIllegalArgumentException` is thrown --Groups can be referred to by name in addition to indexes --See [here](#% E3% 82% B0% E3% 83% AB% E3% 83% BC% E3% 83% 97) for details.

replaceFirst(String, String)

package sample.regexp;

public class Main {

    public static void main(String[] args) {
        String text = "abc123";

        System.out.println(text.replaceFirst("[a-z]", "*"));
    }
}

Execution result


*bc123

--Replace only the first matching part of the substring that matches the regular expression --The substring can be referenced with $ n, which is the same asreplaceAll ().

split(String, int)

package sample.regexp;

import java.util.Arrays;

public class Main {

    public static void main(String[] args) {
        String text = "a1b2";
        
        for (int i=-1; i<5; i++) {
            String[] elements = text.split("[0-9]", i);
            System.out.println("limit=" + i + ",\telements=" + Arrays.toString(elements));
        }
    }
}

Execution result


limit=-1,	elements=[a, b, ]
limit=0,	elements=[a, b]
limit=1,	elements=[a1b2]
limit=2,	elements=[a, b2]
limit=3,	elements=[a, b, ]
limit=4,	elements=[a, b, ]

--Split the string at the location that matches the regular expression specified in the first argument --The second argument, limit, determines the upper limit of the size of the return array. --If you specify a value greater than or equal to 1 for limit, the matching substring will be split up to limit --1. --If limit == 1, then limit -1 => 0, so no splitting is done (as a result, the size of the array is 1). --In the case of limit == 2, it becomes limit --1 => 1, so the split is performed at the 1 part of ʻa1b2 that first matches the regular expression [0-9] . We then end the split (resulting in an array size of 2) --If a value less than or equal to 0 is specified for limit, it will be treated as unlimited and division will be executed up to the end of the character string. --However, the behavior when there is no character string left at the end of the split result (it becomes blank) differs between 0and negative numbers. --For negative numbers, the last whitespace is also left as an array element --If0`, the last whitespace is discarded

If the beginning becomes blank as a result of division, the blank is set as an element of the array as it is.

package sample.regexp;

import java.util.Arrays;

public class Main {

    public static void main(String[] args) {
        String text = "0a1b2";
        
        String[] elements = text.split("[0-9]", 0);
        System.out.println(Arrays.toString(elements));
    }
}

Execution result


[, a, b]

split(String) This is the same behavior as setting the second argument of split (String, int) to 0.

Pattern class

Differences from String methods

With some exceptions [^ 1], methods that use regular expressions in the String class delegate processing to the Pattern class behind the scenes. For example, if you check the implementation of the replaceAll () method, it looks like this:

String.replaceAll()


    public String replaceAll(String regex, String replacement) {
        return Pattern.compile(regex).matcher(this).replaceAll(replacement);
    }

This Pattern class (and Matcher) is in charge of processing regular expressions in Java.

The Pattern class interprets the string passed bycompile ()as a regular expression. If the regular expression you use is fixed, it's more efficient to run this compile () only the first time and then reuse the Pattern instance. (The Pattern class is immutable, so it can be safely reused even in multithreading.)

However, when using a method that uses a regular expression of the String class, thiscompile ()is executed every time. Therefore, if you use the String method when executing a fixed regular expression over and over again, the processing speed will be slower than reusing the Pattern instance.

Example of reusing Pattern


public class Hoge {
    //Reuse the compiled Pattern instance
    private static final Pattern HOGE_PATTERN = Pattern.compile("[0-9]+");

    public boolean test(String text) {
        return HOGE_PATTERN.matcher(text).matches(); //The movement is text.maches("[0-9]+")Same as
    }
}

Basic usage

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        Pattern pattern = Pattern.compile("[0-9]+");
        
        Matcher abc = pattern.matcher("123abc");
        System.out.println(abc.matches());

        Matcher _123 = pattern.matcher("123");
        System.out.println(_123.matches());
    }
}

Execution result


false
true

--First, compile the regular expression with Pattern.compile (String) and get the Pattern instance. --Next, pass the character string (** input sequence **) you want to verify with the Pattern.matcher (String) method and get the Matcher instance. --Use the acquired Matcher instance to verify whether it matches or not. --Matcher.matches () verifies that the entire input sequence matches the regular expression and returns a result with boolean

Split

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.Arrays;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        Pattern pattern = Pattern.compile("[a-z]+");

        String[] elements = pattern.split("123abc456def789ghi");
        System.out.println(Arrays.toString(elements));

        elements = pattern.split("123abc456def789ghi", -1);
        System.out.println(Arrays.toString(elements));
    }
}

Execution result


[123, 456, 789]
[123, 456, 789, ]

--Split the string at the part of the specified string that matches the regular expression with Pattern.split (String) --The movement is the same as String.split (String), String.split (String, int)

Matcher --Pattern is a class that interprets regular expressions, and Matcher does the following: --Whether the input sequence matches the regular expression --Extraction of matched parts --Replacement of matched parts --Matcher is used in the following steps.

  1. Perform a match operation
  2. Query the result of the match operation
  3. Repeat steps 1 and 2 if necessary --The result of the match operation can be referenced by the following method. --start () Start index on matched input sequence --ʻEnd () End index on matched input sequence + 1 --group () Matched substring --If you execute these methods without performing a match operation, ʻIllegalStateException will be thrown. --Note that Matcher is not thread-safe ** **

Match operation

There are three match operations in Matcher.

matches()

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        test("abc");
        test("abc123");
    }

    private static void test(String text) {
        Pattern pattern = Pattern.compile("[a-z]+");
        Matcher matcher = pattern.matcher(text);

        System.out.println("[text=" + text + "]");
        if (matcher.matches()) {
            System.out.println("matches = true");
            System.out.println("start = " + matcher.start());
            System.out.println("end = " + matcher.end());
            System.out.println("group = " + matcher.group());
        } else {
            System.out.println("matches = false");
        }
    }
}

Execution result


[text=abc]
matches = true
start = 0
end = 3
group = abc

[text=abc123]
matches = false

--matches () verifies that the entire input sequence matches the regular expression --Returns true if there is a match

lookingAt()

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        test("abc");
        test("123abc");
        test("ab12");
    }

    private static void test(String text) {
        Pattern pattern = Pattern.compile("[a-z]+");
        Matcher matcher = pattern.matcher(text);

        System.out.println("[text=" + text + "]");
        if (matcher.lookingAt()) {
            System.out.println("lookingAt = true");
            System.out.println("start = " + matcher.start());
            System.out.println("end = " + matcher.end());
            System.out.println("group = " + matcher.group());
        } else {
            System.out.println("lookingAt = false");
        }
    }
}

Execution result


[text=abc]
lookingAt = true
start = 0
end = 3
group = abc

[text=123abc]
lookingAt = false

[text=ab12]
lookingAt = true
start = 0
end = 2
group = ab

--lookingAt () verifies that the regular expression matches from the beginning of the input sequence --If the verification results from the beginning match, true is returned (the whole does not have to match).

find()

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        test("abc");
        test("123abc456def789");
    }

    private static void test(String text) {
        Pattern pattern = Pattern.compile("[a-z]+");
        Matcher matcher = pattern.matcher(text);

        System.out.println("[text=" + text + "]");
        while (matcher.find()) {
            System.out.println("start = " + matcher.start());
            System.out.println("end = " + matcher.end());
            System.out.println("group = " + matcher.group());
        }
    }
}

Execution result


[text=abc]
start = 0
end = 3
group = abc

[text=123abc456def789]
start = 3
end = 6
group = abc

start = 9
end = 12
group = def

--The find () method scans the beginning of the input sequence for a matching regular expression. --Returns true if there is a matching substring --If you execute find () again, it will scan for a substring that matches again from the previously matched part. --Matched substrings can be extracted by repeatedly executing --start (), ʻend () , group () `returns the result of the last match

Replacement

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        Pattern pattern = Pattern.compile("[a-z]+");
        Matcher matcher = pattern.matcher("abc123def");

        System.out.println("replaceAll = " + matcher.replaceAll("*"));
        System.out.println("replaceFirst = " + matcher.replaceFirst("*"));
    }
}

Execution result


replaceAll = *123*
replaceFirst = *123def

--Replace all matched substrings with Matcher.replaceAll (String) --Matcher.replaceFirst (String) replaces only the first matched substring

group

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        Pattern pattern = Pattern.compile("([a-z]+)([0-9]+)");
        Matcher matcher = pattern.matcher("abc123de45fg");

        int groupCount = matcher.groupCount();
        System.out.println("groupCount=" + groupCount);
        
        while (matcher.find()) {
            System.out.println("==========");
            String group = matcher.group();
            System.out.println("group=" + group);
            
            for (int i=0; i<=groupCount; i++) {
                String g = matcher.group(i);
                System.out.println("group(" + i + ")=" + g);
            }
        }
    }
}

Execution result


groupCount=2
==========
group=abc123
group(0)=abc123
group(1)=abc
group(2)=123
==========
group=de45
group(0)=de45
group(1)=de
group(2)=45

--The following methods are provided to refer to the group defined by the regular expression (the part enclosed by ()). --groupCount () Get the number of groups defined by the regular expression --group () Get the entire string matched by the most recent match operation --group (int) Get the group of the specified index among the groups matched in the latest match operation. --The number 0 is the entire matched string, so it returns the same result asgroup () --From 1 to the matching substring

Give the group a name

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        Pattern pattern = Pattern.compile("(?<alphabets>[a-z]+)(?<numbers>[0-9]+)");
        Matcher matcher = pattern.matcher("abc123de45fg");
        
        while (matcher.find()) {
            System.out.println("==========");
            System.out.println("group(alphabets)=" + matcher.group("alphabets"));
            System.out.println("group(numbers)=" + matcher.group("numbers"));
        }
    }
}

Execution result


==========
group(alphabets)=abc
group(numbers)=123
==========
group(alphabets)=de
group(numbers)=45

--You can define a name for a group by defining the group as (? <Group name> pattern). --You can get the substring that matches the group by specifying the name defined by the group (String) method.

To refer to the group name in the replacement string, do the following:

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        Pattern pattern = Pattern.compile("(?<alphabets>[a-z]+)(?<numbers>[0-9]+)");
        Matcher matcher = pattern.matcher("abc123def456");

        String replaced = matcher.replaceAll("${numbers}${alphabets}");
        System.out.println(replaced);
    }
}

Execution result


123abc456def

--You can refer to a group with $ {group name}

flag

--When creating a Pattern instance, you can adjust the way regular expressions are interpreted with the ** flag **.

Compile with flags


Pattern pattern = Pattern.compile("[a-z]", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);

--Flag is specified by the second argument of compile (String, int) --You can specify constants declared as static in the Pattern class. --Since it is a bit mask, when specifying multiple flags, specify them by concatenating them with |.

CASE_INSENSITIVE (case insensitive)

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        Pattern pattern = Pattern.compile("[a-z]+", Pattern.CASE_INSENSITIVE);
        Matcher matcher = pattern.matcher("ABC");
        System.out.println(matcher.matches());
    }
}

Execution result


true

--If you specify CASE_INSENSITIVE, the match is case insensitive. --Only US-ASCII characters are indistinguishable

UNICODE_CASE (Unicode case insensitive)

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        Pattern pattern = Pattern.compile("[a-zA-Z]+", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
        Matcher matcher = pattern.matcher("ABCabc");
        System.out.println(matcher.matches());
    }
}

Execution result


true

―― Combining ʻUNICODE_CASEandCASE_INSENSITIVE` provides case-insensitive matching in Unicode.

LITERAL (do not use regular expression metacharacters or escape characters)

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        Pattern pattern = Pattern.compile("[a-z]+", Pattern.LITERAL);
        
        Matcher matcher = pattern.matcher("abc");
        System.out.println(matcher.matches());
        
        matcher = pattern.matcher("[a-z]+");
        System.out.println(matcher.matches());
    }
}

Execution result


false
true

--If LITERAL is specified, the string passed in the first argument ofcompile (String, int)will be processed as a simple string. --Regular expression meaningful characters such as [] and + are simply interpreted as the characters themselves.

MULTILINE (processes multi-line strings)

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.function.Supplier;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        test("[default]", () -> Pattern.compile("^[a-z]+$"));
        test("[MULTILINE]", () -> Pattern.compile("^[a-z]+$", Pattern.MULTILINE));
    }
    
    private static void test(String label, Supplier<Pattern> patternSupplier) {
        System.out.println(label);
        Pattern pattern = patternSupplier.get();

        String text = "abc\n"
                    + "def\n";

        Matcher matcher = pattern.matcher(text);

        while (matcher.find()) {
            String group = matcher.group();
            System.out.println(group);
        }
    }
}

Execution result


[default]
[MULTILINE]
abc
def

--When MULTILINE is specified, the handling of^and$representing the beginning and end of lines changes --If nothing is specified, ^ and $ match purely at the beginning and end of the string. --If MULTILINE is specified, each line break will be treated as a string, so^and$will match the beginning and end of each line.

COMMENTS (Allows you to write comments)

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        String regexp = "#This line is ignored as a comment\n"
                      + "  [a-z]+  ";
        Pattern pattern = Pattern.compile(regexp, Pattern.COMMENTS);

        Matcher matcher = pattern.matcher("abc");

        System.out.println(matcher.matches());
    }
}

Execution result


true

--If COMMENTS is specified, the following strings will be treated as comments and ignored. --From # to the end of the line --Blank space

DOTALL (make sure it matches the end of the line with .)

package sample.regexp;

import org.openjdk.jmh.runner.RunnerException;

import java.util.function.Supplier;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) throws RunnerException {
        test("[default1]", () -> Pattern.compile(".+"));
        test("[default2]", () -> Pattern.compile(".+$"));
        test("[DOTALL]", () -> Pattern.compile(".+", Pattern.DOTALL));
    }

    private static void test(String label, Supplier<Pattern> patternSupplier) {
        System.out.println(label);
        Pattern pattern = patternSupplier.get();

        String text = "abc\n"
                    + "def\n";

        Matcher matcher = pattern.matcher(text);

        if (matcher.find()) {
            String group = matcher.group();
            System.out.println(group);
        }
    }
}

Execution result


[default1]
abc
[default2]
def
[DOTALL]
abc
def

--If you specify DOTALL, . will also match the end of the line. --By default, . does not match end of line

reference

[^ 1]: For example, the split (String regexp) method splits without using Pattern when regexp is a plain string that does not use regular expression metacharacters. Is going

Recommended Posts

Notes on how to use regular expressions in Java
How to use classes in Java?
How to use Java lambda expressions
Multilingual Locale in Java How to use Locale
Notes on regular expressions
Notes on how to use each JUnit Rule
Notes on how to write comments in English
[Java] How to use Map
How to use java Optional
How to use java class
[Java] How to use string.format
How to use Java Map
How to use Java variables
[Java] How to use Optional ①
How to use java non-standard library on IntelliJ IDEA
[Java] How to execute tasks on a regular basis
[Java] How to update Java on Windows
How to use Java HttpClient (Post)
[Java] How to use join method
How to use Ruby on Rails
How to learn JAVA in 7 days
How to use Bio-Formats on Ubuntu 20.04
[Processing × Java] How to use variables
Notes on signal control in Java
How to use InjectorHolder in OpenAM
[Java] How to use LinkedHashMap class
[JavaFX] [Java8] How to use GridPane
How to use class methods [Java]
[Java] How to use List [ArrayList]
How to name variables in Java
[Processing × Java] How to use arrays
[Java] How to use Math class
How to use Java enum type
How to concatenate strings in java
How to use JSON data in WebSocket communication (Java, JavaScript)
How to call and use API in Java (Spring Boot)
How to use Java enums (Enum) in MyBatis Mapper XML
How to switch Java in the OpenJDK era on Mac
How to check Java installed on Mac
A memorandum on how to use Eclipse
How to implement date calculation in Java
How to implement Kalman filter in Java
How to use Apache Derby on Eclipse
[Java] How to use the File class
How to use custom helpers in rails
[Ruby on Rails] How to use CarrierWave
[Java] How to use the hasNext function
How to use named volume in docker-compose.yml
How to use submit method (Java Silver)
[Java] How to use the HashMap class
How to do base conversion in Java
[Java] How to use the toString () method
Studying how to use the constructor (java)
How to automatically operate a screen created in Java on Windows
[Processing × Java] How to use the loop
How to use Docker in VSCode DevContainer
How to switch Java versions on Mac
How to use MySQL in Rails tutorial
How to implement coding conventions in Java
How to use Java classes, definitions, import
How to embed Janus Graph in Java