StackOverflowError occurs when processing a long sentence that satisfies a specific regular expression.
Test.java
public class Test {
public static void main(final String[] args) {
final StringBuilder input = new StringBuilder();
for (int i = 0; i < 5000; i++) {
input.append("a ");
}
System.out.println(input.toString().replaceAll("(a|\\s)+", ""));
}
}
Exception in thread "main" java.lang.StackOverflowError
at java.base/java.util.regex.Pattern$GroupHead.match(Pattern.java:4786)
at java.base/java.util.regex.Pattern$Loop.match(Pattern.java:4923)
at java.base/java.util.regex.Pattern$GroupTail.match(Pattern.java:4845)
at java.base/java.util.regex.Pattern$BranchConn.match(Pattern.java:4695)
...
First, the regular expression given as a string is the Pattern of the inner class of [Pattern](https://docs.oracle.com/javase/jp/9/docs/api/java/util/regex/Pattern.html). It is decomposed into subclasses of $ Node
. This is for processing Pattern.compile Equivalent. For example, the above-mentioned regular expression (a | \\ s) +
has the following structure.
Start
---> Prolog
---> Loop "+"
---> GroupHead "("
---> Branch "|"
---> BmpCharProperty "a"
---> BranchConn
---> GroupTail
---> BmpCharProperty "\\s"
---> BranchConn
---> GroupTail
---> GroupTail ")"
---> Loop
Then, according to the interpreted structure, it is confirmed whether the condition is satisfied one by one from the beginning of the character string to be processed, and the judgment method is called recursively as long as the condition is satisfied.
Reproduced with the following combinations.
* + {2,}
, etc.) with no upper limit after the parentheses|
or ?
For conditions in parenthesesFor example, regular expressions like (a \\ s?) +
Have the same problem.
Pattern $ Loop
If you don't need to group by parentheses, create an OR condition with the character class (square brackets).
In the case of the example, if you set [a \\ s] +
, no error will occur.
For patterns that just replace with blanks, the unit to match is not particularly important, so even if you simply ʻa | \ swithout specifying
*or
+` (longest match number quantum), all You can replace characters (processing time increases by a few milliseconds).
Or the shortest match number quantum(a|\\s)+?
Or a greedy number quantum(a|\\s)++
If you use, it will be interpreted as a different structure, so no error will occur.
Since it is a structural problem that represents a regular expression, a similar problem occurs in the processing that uses a regular expression.
String
Matcher
The Android app did not have this issue. Java implementation is based on OpenJDK and regular expressions are based on icu native code It's implemented and it's for a completely different thing.
Recommended Posts