There are variations of the StringUtils
split method of Commons Lang of Apache Commons. There are many, but some of them are difficult to understand, and explanations and examples of API documentation Even if I read it or searched the information on the net, I could not understand it well, so I checked the difference.
Latest 3.7 at the moment
1-1. split(String str)
Split the string str
with whitespace (tabs, newlines, whitespace). Double-byte spaces are also recognized as blanks.
1-2. split(String str, char separatorChar)
Split the string str
with the separator character separatorChar
. It differs from 1-1. In that the separator is explicitly specified.
1-3. split(String str, String separatorChars)
Divide the string str
by the separator character group separatorChars
. It differs from 1-2. In that you can specify multiple separator characters **.
At first glance, it seems that a character string separator can be used, but it is difficult to understand that ** separators are one character ** and multiple types can be specified. Furthermore, since the type is String, it is easy to misunderstand, and because the adjacent separator is treated as one separator, it often looks as if it could be separated by a character string, and then sometimes it does not work. Be careful as you are addicted to it.
For example
python
StringUtils.split("xxxxxandyyyyyandzzzzz", "and");
In the case of,
python
{"xxxxx", "yyyyy", "zzzzz"}
Will be returned, so I think, "Oh! You can split with ʻand`. It's okay!"
python
StringUtils.split("xxaxxandyynyyandzzdzz", "and");
In the case of,
python
{"xxaxx", "yynyy", "zzdzz"}
I think that will be returned, but in reality the result is
python
{"xx", "xx", "yy", "yy", "zz", "zz"}
Then, "Huh !?"
If you want to use a character string as a separator, use splitByWholeSeparator ()
described later.
1-4. split(String str, String separatorChars, int max)
In addition to 1-3., You can specify the maximum number of divisions with max
. Even if the number of delimiters is greater than max
, it cannot be delimited to more than max
. For example, if you specify 2 for max
, the maximum number of elements in the returned array will be 2 no matter how many are separated. If you specify 0 or minus for max
, it will be treated as an infinite number.
2-1. splitByWholeSeparator(String str, String separator)
Divide the string str
by the separator string separator
. This is the method to use when you want to separate with a string.
If this is the case
python
StringUtils.splitByWholeSeparator("xxaxxandyynyyandzzdzz", "and");
In the case of
python
{"xxaxx", "yynyy", "zzdzz"}
Will be returned.
2-2. splitByWholeSeparator(String str, String separator, int max)
In addition to 2-1., A version in which the maximum number of divisions max
can be specified.
3-1. splitPreserveAllTokens(String str) In the case of the methods that have appeared so far, adjacent separators are regarded as one separator, but in this system, if the separators are adjacent, an empty token is returned. In other words
python
StringUtils.split("a b");
In the case of,
python
{"a", "b"}
Is returned,
python
StringUtils.splitPreserveAllTokens("a b");
In the case of,
python
{"a", "", "b"}
Is returned. It can be used when processing comma-separated data and tab-separated data.
3-2. splitPreserveAllTokens(String str, char separatorChar) Preserve All Tokens version of 1-2.
3-3. splitPreserveAllTokens(String str, String separatorChars) Preserve All Tokens version of 1-3.
3-4. splitPreserveAllTokens(String str, String separatorChars, int max) Preserve All Tokens version of 1-4.
4-1. splitByWholeSeparatorPreserveAllTokens(String str, String separator) Method name is long! In short, it is a combination of 2 and 3 systems, and if a character string can be used for the separator and the separators are adjacent, an empty token is returned.
4-2. splitByWholeSeparatorPreserveAllTokens(String str, String separator, int max)
The max
specified version of it. The headline is getting longer and longer. Lakitu Lakitu ...
5-1. splitByCharacterType(String str)
A split method with a slightly different coat color that determines the character type and divides consecutive characters of the same type into one token. Character type judgment is [getType ()](https://docs.oracle.com/javase/jp/10/docs/api/java/lang/Character.html#getType (int) of java.lang.Character
)). The categorization is described in the related items list of the API document of this getType () method, but there are so many types!
Example
StringUtils.splitByCharacterType("ABCabc 123");
Execution result
{"ABC", "abc", " ", "123", "Ah"}
5-2. splitByCharacterTypeCamelCase(String str) The division rule is almost the same as 5-1. However, if a lowercase letter comes after the uppercase letter, the uppercase letter belongs to the token on the lowercase letter side. For splitting camel case strings?
Example
StringUtils.splitByCharacterTypeCamelCase("ABCabc 123");
Execution result
{"AB", "Cabc", " ", "123", "Ah"}
Recommended Posts