[Java] A little mysterious split behavior

[Java] A little mysterious split behavior

Writer's environment

Java8 (should be)

split quiz

Suddenly, it's a problem.

Below is the Java code. What will happen to the execution result?

First question! Deden ♪

            String test = "a-i-u-e-o";
            String[] tests = test.split("-");
            System.out.println(tests.length);
            System.out.println(java.util.Arrays.toString(tests));
Answer
5
[a, i, u, e, o]

Did you answer correctly?

Next. What are the results here?

            String test = "--o";
            String[] tests = test.split("-");
            System.out.println(tests.length);
            System.out.println(java.util.Arrays.toString(tests));
Answer
3
[, , o]

Did you answer correctly? If there is a delimiter at the beginning, it will behave like this.

Next is here. What are the results here? I think you can expect it somehow.

            String test = "a----o";
            String[] tests = test.split("-");
            System.out.println(tests.length);
            System.out.println(java.util.Arrays.toString(tests));
Answer
5
[a, , , , o]

Did you answer correctly?

Well the last. What are the results here? If you answered correctly so far, I'm sure </ font> is easy. Let's go crispy.

            String test = "a--";
            String[] tests = test.split("-");
            System.out.println(tests.length);
            System.out.println(java.util.Arrays.toString(tests));
Answer
1
[a]

Did you answer correctly? Congratulations to those who answered correctly: tada: It was regrettable that the person came off ... (´ ・ ω ・ `)

By the way, my reaction when I saw these behaviors was like this.

₍₍(ง˘ω˘)ว⁾⁾

?????????????????????????? Eh ...? Ah, yeah ... Hmm (´_ ゝ `) Why isn't it?

Java split

The following is a bird's-eye view of what has been released so far.

/* 
 * Java Playground
 * https://code.sololearn.com
 */
class Main {
    public static void main(String[ ] args) {
        {
            String test = "a-i-u-e-o";
            String[] tests = test.split("-");
            System.out.println(tests.length); // 5
            System.out.println(java.util.Arrays.toString(tests)); // [a,i,u,e,o]
        }
        {
            String test = "a-";
            String[] tests = test.split("-");
            System.out.println(tests.length); // 1
            System.out.println(java.util.Arrays.toString(tests)); // [a]
        }
        {
            String test = "-o";
            String[] tests = test.split("-");
            System.out.println(tests.length); // 2
            System.out.println(java.util.Arrays.toString(tests)); // [,o]
        }
        {
            String test = "a--";
            String[] tests = test.split("-");
            System.out.println(tests.length); // 1
            System.out.println(java.util.Arrays.toString(tests)); // [a]
        }
        {
            String test = "--o";
            String[] tests = test.split("-");
            System.out.println(tests.length); // 3
            System.out.println(java.util.Arrays.toString(tests)); // [,,o]
        }
        {
            String test = "a----o";
            String[] tests = test.split("-");
            System.out.println(tests.length); // 5
            System.out.println(java.util.Arrays.toString(tests)); // [a,,,,o]
        }
    }
}

What do you think. I think honestly unpleasant gefungefun </ font> I wanted it to be clear whether to ignore the sky or not. However, there may be some reason for this behavior. [^ 1]

Serpentine

Golang split

By the way, split behavior seems to be different for each language, so please be careful when dealing with multiple languages. As an example, here is a relatively easy-to-understand Golang sample.

Golang split sample
/*
 * Golang Playground
 * https://play.golang.org/
 */
package main

import (
	"fmt"
	"strings"
)

func main() {
	{
		test := "a-i-u-e-o"
		tests := strings.Split(test, "-")
		fmt.Println(len(tests)) // 5
		fmt.Println(tests) // [a i u e o]
	}
	{
		test := "a-"
		tests := strings.Split(test, "-")
		fmt.Println(len(tests)) // 2
		fmt.Println(tests) // [a ]
	}
	{
		test := "-o"
		tests := strings.Split(test, "-")
		fmt.Println(len(tests)) // 2
		fmt.Println(tests) // [ o]
	}
	{
		test := "a--"
		tests := strings.Split(test, "-")
		fmt.Println(len(tests)) // 3
		fmt.Println(tests) // [a  ]
	}
	{
		test := "--o"
		tests := strings.Split(test, "-")
		fmt.Println(len(tests)) // 3
		fmt.Println(tests) // [  o]
	}
	{
		test := "a----o"
		tests := strings.Split(test, "-")
		fmt.Println(len(tests)) // 5
		fmt.Println(tests) // [a    o]
	}
}

After seeing the behavior of Java, it feels straightforward ...

appendix

Java split seems to just ignore the rightmost delimiter, so If you want to mimic the same behavior in Golang, you may want to remove the rightmost delimiter and then split. I don't know if there is demand, but I will paste a sample. I don't know if there is demand.

Golang behaves like a Java split (1 member)
package main

import (
	"fmt"
	"strings"
)

func main() {
	{
		test := "a-i-u-e-o"
		tests := javaSplit(test, "-")
		fmt.Println(len(tests)) // 5
		fmt.Println(tests) // [a i u e o]
	}
	{
		test := "a-"
		tests := javaSplit(test, "-")
		fmt.Println(len(tests)) // 1
		fmt.Println(tests) // [a]
	}
	{
		test := "-o"
		tests := javaSplit(test, "-")
		fmt.Println(len(tests)) // 2
		fmt.Println(tests) // [ o]
	}
	{
		test := "a--"
		tests := javaSplit(test, "-")
		fmt.Println(len(tests)) // 1
		fmt.Println(tests) // [a]
	}
	{
		test := "--o"
		tests := javaSplit(test, "-")
		fmt.Println(len(tests)) // 3
		fmt.Println(tests) // [  o]
	}
	{
		test := "a----o"
		tests := javaSplit(test, "-")
		fmt.Println(len(tests)) // 5
		fmt.Println(tests) // [a    o]
	}
}

//Java String#split(delimiter)Imitate
func javaSplit(str string, delimiter string) []string {
    return strings.Split(strings.TrimRight(str, delimiter), delimiter)
}

Well, if you want Java to behave like Golang ...? ~~ Ah ... .. .Please do your best! ~~

[Addition]

Well, if you want Java to behave like Golang ...?

@ saka1029 taught me!

Should I specify a negative number for the second argument? ([String.split (String, int)](https://docs.oracle.com/javase/jp/13/docs/api/java.base/java/lang/String.html#split (java.lang.) String, int))))

String[] test = "a--".split("-", -1);
System.out.println(test.length);               // -> 3
System.out.println(Arrays.toString(test));     // -> [a, , ]
Excerpt from javadoc

public String[] split​(String regex, int limit)

The> limit parameter controls the number of times this pattern is applied and thus affects the length of the resulting array.

  • If the "limit" is positive, the pattern applies to most "limits". -One time, the length of the array never exceeds the "limit" and all inputs beyond the last matched delimiter are included in the last entry of the array.
  • If the "limit" is zero, the pattern will be applied as many times as possible, the array can be of any length, and any trailing empty string will be discarded.
  • If the "limit" is negative, the pattern will be applied as much as possible and the length of the array will be arbitrary.

Exactly this! Yay! ~~ Or rather, read javadoc before writing ~~ The link is java13 doc, but java8 Is the same.

at the end

I would appreciate it if you could comment if you have any comments such as "It's bad here" or "That guy!" ₍₍ (ง ˘ω˘) ว ⁾⁾ Zero doesn't tell me anything ...

[^ 1]: There is no doubt that it is a specification for the time being. https://docs.oracle.com/javase/jp/8/docs/api/java/lang/String.html#split-java.lang.String-

Recommended Posts