[JAVA] Differences in split methods of StringUtils

Overview

There are variations of the StringUtils split method of Commons Lang of Apache Commons. There are many, but some of them are difficult to understand, and explanations and examples of API documentation Even if I read it or searched the information on the net, I could not understand it well, so I checked the difference.

Version used

Latest 3.7 at the moment

variation

1. split () system

1-1. split(String str) Split the string str with whitespace (tabs, newlines, whitespace). Double-byte spaces are also recognized as blanks.

1-2. split(String str, char separatorChar) Split the string str with the separator character separatorChar. It differs from 1-1. In that the separator is explicitly specified.

1-3. split(String str, String separatorChars) Divide the string str by the separator character group separatorChars. It differs from 1-2. In that you can specify multiple separator characters **. At first glance, it seems that a character string separator can be used, but it is difficult to understand that ** separators are one character ** and multiple types can be specified. Furthermore, since the type is String, it is easy to misunderstand, and because the adjacent separator is treated as one separator, it often looks as if it could be separated by a character string, and then sometimes it does not work. Be careful as you are addicted to it. For example

python


StringUtils.split("xxxxxandyyyyyandzzzzz", "and");

In the case of,

python


{"xxxxx", "yyyyy", "zzzzz"}

Will be returned, so I think, "Oh! You can split with ʻand`. It's okay!"

python


StringUtils.split("xxaxxandyynyyandzzdzz", "and");

In the case of,

python


{"xxaxx", "yynyy", "zzdzz"}

I think that will be returned, but in reality the result is

python


{"xx", "xx", "yy", "yy", "zz", "zz"}

Then, "Huh !?" If you want to use a character string as a separator, use splitByWholeSeparator () described later.

1-4. split(String str, String separatorChars, int max) In addition to 1-3., You can specify the maximum number of divisions with max. Even if the number of delimiters is greater than max, it cannot be delimited to more than max. For example, if you specify 2 for max, the maximum number of elements in the returned array will be 2 no matter how many are separated. If you specify 0 or minus for max, it will be treated as an infinite number.

2. splitByWholeSeparator () system

2-1. splitByWholeSeparator(String str, String separator) Divide the string str by the separator string separator. This is the method to use when you want to separate with a string. If this is the case

python


StringUtils.splitByWholeSeparator("xxaxxandyynyyandzzdzz", "and");

In the case of

python


{"xxaxx", "yynyy", "zzdzz"}

Will be returned.

2-2. splitByWholeSeparator(String str, String separator, int max) In addition to 2-1., A version in which the maximum number of divisions max can be specified.

3. splitPreserveAllTokens () system

3-1. splitPreserveAllTokens(String str) In the case of the methods that have appeared so far, adjacent separators are regarded as one separator, but in this system, if the separators are adjacent, an empty token is returned. In other words

python


StringUtils.split("a  b");

In the case of,

python


{"a", "b"}

Is returned,

python


StringUtils.splitPreserveAllTokens("a  b");

In the case of,

python


{"a", "", "b"}

Is returned. It can be used when processing comma-separated data and tab-separated data.

3-2. splitPreserveAllTokens(String str, char separatorChar) Preserve All Tokens version of 1-2.

3-3. splitPreserveAllTokens(String str, String separatorChars) Preserve All Tokens version of 1-3.

3-4. splitPreserveAllTokens(String str, String separatorChars, int max) Preserve All Tokens version of 1-4.

4. splitByWholeSeparatorPreserveAllTokens () system

4-1. splitByWholeSeparatorPreserveAllTokens(String str, String separator) Method name is long! In short, it is a combination of 2 and 3 systems, and if a character string can be used for the separator and the separators are adjacent, an empty token is returned.

4-2. splitByWholeSeparatorPreserveAllTokens(String str, String separator, int max) The max specified version of it. The headline is getting longer and longer. Lakitu Lakitu ...

5. splitByCharacterType () system

5-1. splitByCharacterType(String str) A split method with a slightly different coat color that determines the character type and divides consecutive characters of the same type into one token. Character type judgment is [getType ()](https://docs.oracle.com/javase/jp/10/docs/api/java/lang/Character.html#getType (int) of java.lang.Character )). The categorization is described in the related items list of the API document of this getType () method, but there are so many types!

Example


StringUtils.splitByCharacterType("ABCabc 123");

Execution result


{"ABC", "abc", " ", "123", "Ah"}

5-2. splitByCharacterTypeCamelCase(String str) The division rule is almost the same as 5-1. However, if a lowercase letter comes after the uppercase letter, the uppercase letter belongs to the token on the lowercase letter side. For splitting camel case strings?

Example


StringUtils.splitByCharacterTypeCamelCase("ABCabc 123");

Execution result


{"AB", "Cabc", " ", "123", "Ah"}

Recommended Posts

Differences in split methods of StringUtils
List of methods used in PAIZA D rank
Split routes.rb in Rails6
Output in multiples of 3
Think about the differences between functions and methods (in Java)
Basic methods of Ruby hashes
Basic methods of Ruby arrays
Dynamically call methods in JSF
About validation methods in JUnit
Helper methods available in devise
Test private methods in JUnit
Test private methods in JUnit
Summary of "abstract interface differences"
Implementation of gzip in java
Judgment of fractions in Ruby
Implementation of tri-tree in Java
Pitfalls of WebTarget.queryParam () in JAX-RS
Mock static methods in Mockito 3.4
Implementation of HashMap in kotlin
Concatenate strings returned by methods of multiple objects in Java Stream