reference: https://stackoverflow.com/questions/17971466/java-regex-overlapping-matches
Normally, when matching a regular expression, the string part adopted in one match is not duplicated in another match.
For example, from the string " _apple_banana_cherry_
"
If you match / _ [^ _] + _ /
, you can get two things, _apple_
and _cherry_
.
If you want to use the overlapping "\ _" to take three of _apple_
, _banana_
, and _cherry_
, you need to specify specially.
An easy way is to use the regular expression package regex
and match it with the option ʻoverlapped = True`.
{.python}
>>> import regex as re
>>> re.findall("_[^_]+_", "_apple_banana_cherry_")
['_apple_', '_cherry_']
>>> re.findall("_[^_]+_", "_apple_banana_cherry_", overlapped=True)
['_apple_', '_banana_', '_cherry_']
In the case of Java, it is easy to use the standard regular expression package and shift the startIndex.
{.java}
Matcher m = Pattern.compile("_[^_]+(_)").matcher("_apple_banana_cherry_");
if (m.find()) {
do {
System.out.println(m.group());
} while (m.find(m.start(1)));
}
Commentary:
With / _ [^ _] + (_) /
, the second appearing _
is enclosed in()
so that it can be acquired as group 1.
m.start (1)
returns the first index of group 1.
m.find (N)
means to start the match from the Nth character.
So m.find (m.start (1))
means to start m.find () from the first index of group 1.
In other words, in the second and subsequent loops, the next match is started from _
at the end of the matched character string.
note: It may be easier to specify which group should be the first index of the next match using a named group.
{.example}
(?<name>PATTERN)
You may also need a non-capture group for more complex match patterns.
{.example}
(?:PATTERN)
Recommended Posts