The story of low-level string comparison in Java

This article is the third day of Shizuoka University Information LT Tournament Advent Calendar 2019.

Introduction

It ’s a low-level story. It's easy to read if Java can be made normal for the time being. If you have compared strings, you can read it for the time being.

review(?)

When comparing strings in Java, do not compare with "==". Use the String # equals method.

public class Test {
	public static void main(String[] args) {
		String a = "HelloWorld";
		String b = "Hello";
		b += "World";

		System.out.println(a == b);        // false
		System.out.println(a.equals(b));   // true
	}
}

Wait a minute, "==" will be true

public class Test {
	public static void main(String[] args) {
		String a = "HelloWorld";
		String b = "HelloWorld";

		System.out.println(a == b);       // true
		System.out.println(a.equals(b));  // true
	}
}

Why?

Mechanism of "=="

JVM "==" Is often said to be a comparison of address locations, but let's take a closer look.

In Java, "==" is often converted to one of the JVM instructions, such as "if_acmpne" or "if_acmpeq", at compile time.

The JVM produces a lot of class files when you compile Java files, right? The Java Virtual Machine (JVM) interprets and executes the class file. Class files are in a format that many people can't read (and rarely do), but they're easier to read in the JVM (computer). Since the JVM realizes a complex program by combining multiple simple instructions, what is only one instruction in Java is often converted into multiple instructions in the JVM. image.png

JVM instruction and operand stack

Then, what kind of instruction is "if_acmpne" or "if_acmpeq" is an instruction that takes out two from the operand stack, checks whether they match, and jumps to the line of the specified program.

Operand stack ……? The JVM is called a stack machine, and does not use registers, but uses a device called a "stack" to perform various operations.

The operand stack is like a workspace. It is often used because it is just right for four arithmetic operations. (Reverse Polish notation, etc.)

For example, 5 + 12 can be calculated by the following mechanism. image.png

However, the operand stack in Java has only one element up to 32 bits.

When one character is represented by Java, one element (= 32bit) of this operand stack is consumed. The string is variable size. Sometimes we handle one character, and sometimes we handle hundreds of characters as in this article. In other words, one element of the operand stack (= 32bit) cannot handle all characters at all.

Therefore, save the string in a separate memory and save the address in that memory (the address on your computer) in the operand stack. As a result, one element of the operand stack (= 32bit) is sufficient.

As mentioned earlier, "if_acmpne" and "if_acmpeq" are taken from the operand stack, and when they are equal for "if_acmpeq" and not equal for "if_acmpne", they jump to another specified line. (Jump is like a common GOTO command) image.png

In other words, with "if_acmpeq", if the numbers of the addresses on the operand stack are equal, a jump instruction will be generated. In the second source code of this page ("Wait a minute"), "Hello World" was stored at the same address, so it is displayed as true.

To be true ...

Is it stored in the same place?

That's it.

So what do you think this will be? Please forgive that the naming is appropriate

public class Test {
	public static void main(String[] args) {
		String a = "HelloWorld";
		method1(a);
	}
	
	public static void method1(String c) {
		String k = "HelloWorld";
		System.out.println(k == c);
	}
}

The answer is "true".

Class constant area

Java class files have a constant area. This constant area mainly stores character strings, etc. when compiling. Things like magic numbers and magic strings (?) Are stored in this constant area and are read into memory and used when the JVM reads the class file. For example

System.out.println("Hello Ja! Ja!");

Such as "Hello Ja! Ja!" Is stored in the constant area and the character string is read from there.

Also, the Java compiler is smart enough that even if the same string appears twice, it will be read from the same constant area if it has been used more than once before. In other words, in the previous code, even if the methods were different, the addresses were the same because they referred to the string "Hello World" in the constant area of the same class.

If you look at the constant area of the class file, binary, "Hello World" is saved properly. image.png

As an aside, in the first source code on this page, I intentionally wrote the strings in two parts, "Hello" and "World", in order to separate the storage area. The Java compiler doesn't seem to do that much. C compilers often do this.

What about equals?

It is overridden by the String class, and it is a code that compares the contents firmly The image code looks like the one below. Not the original sic.

@Override
public boolean equals(String str) {
	if(this.length() != str.length()) {
		return false;
	}
	for(int i = 0; i < str.length(); i++) {
		if(this.charAt(i) != str.charAt(i)) {    //You can retrieve any nth character with charAt
			return false;
		}
	}
	return true;
}

The characters are compared one by one linearly from the front. There is no problem because one character fits in 32bit.

Summary

Use the equals method when comparing strings. When "==" is true, it is stored in the same place. Hmmm.

The article on tomorrow's Advent calendar is also good! .. Shizuoka University Information LT Tournament Advent Calendar 2019

Recommended Posts

The story of low-level string comparison in Java
The story of writing Java in Emacs
The story of making ordinary Othello in Java
The story of learning Java in the first programming
[Java] Speed comparison of string concatenation
[Java version] The story of serialization
[Java] Correct comparison of String type
Get the result of POST in Java
Regarding String type equivalence comparison in Java
The story of AppClip support in Anyca
[Java] Comparison of String type character strings
The story of not knowing the behavior of String by passing Java by reference
[Java] Handling of JavaBeans in the method chain
About the idea of anonymous classes in Java
A story about the JDK in the Java 11 era
The story of forgetting to close a file in Java and failing
Measure the size of a folder in Java
[Java] String comparison and && and ||
[Java] Get the length of the surrogate pair string
[Java] The confusing part of String and StringBuilder
[Note] Java: Measures the speed of string concatenation
Feel the passage of time even in Java
The story of acquiring Java Silver in two months from completely inexperienced.
Import files of the same hierarchy in Java
[Java] Equivalence comparison where beginners fail in string comparison. You may not even be aware of the mistake! ??
Character string comparison: I was caught in the skill check problem of Paiza
Get the URL of the HTTP redirect destination in Java
Count the number of occurrences of a string in Ruby
The comparison of enums is ==, and equals is good [Java]
The story of an Illegal State Exception in Jetty.
[Java] Get the file in the jar regardless of the environment
Change the storage quality of JPEG images in Java
The story of making dto, dao-like with java, sqlite
Summarize the additional elements of the Optional class in Java 9
The story that .java is also built in Unity 2018
Implementation of gzip in java
Implementation of tri-tree in Java
Was done in the base year of the Java calendar week
A quick explanation of the five types of static in Java
Put the file in the properties of string in spring xml configuration
20190803_Java & k8s on Azure The story of going to the festival
The story of throwing BLOB data from EXCEL in DBUnit
[Java] Try editing the elements of the Json string using the library
Count the number of digits after the decimal point in Java
How to derive the last day of the month in Java
The story of pushing Java to Heroku using the BitBucket pipeline
Access the network interface in Java
[Java] Delete the elements of List
Guess the character code in Java
Various methods of Java String class
Specify the java location in eclipse.ini
Story of passing Java Gold SE8
Java comparison using the compareTo () method
Unzip the zip file in Java
The story of @ViewScoped consuming memory
Various methods of the String class
List of members added in Java 9
Parsing the COTOHA API in Java
List of types added in Java 9
Order of processing in the program
I read the source of String