[JAVA] The application absorbs the difference in character code

Overview

The other day, when I was thinking about character codes, I suddenly had the following question.

"The byte string should be different depending on the character code, how does the computer recognize it?"

I had this question because of my lack of ability, but this time I summarized what I learned.

Conclusion

The conclusion is that the difference in the character code of the program we input is converted into a unique byte string by the application such as the compiler, so ** the character code of the program does not affect the hardware such as the CPU. Can be said. ** **

My original assumption

図.png

Actual movement

図2.png But if you think about it **, if your application behaves as "my original assumption", then you're talking about what the compiler is doing. ** It made me realize my lack of study again.

I tried experimenting

⓪ Hypothesis

(1) Save a Java file in different locations twice in total. When saving, each is saved with a different character code. (The same source can be saved with a different character code) (2) Comparing the differences in the files created in (1) with binary data, ** the differences should be confirmed because they are saved in different character codes ** ③ Convert each to a class file ④ Comparing the differences of each class file, ** the difference should not be confirmed **

① Prepare the same source saved with different character codes

Prepare the following sources by referring to HelloWorld.java.

HelloWorld.java


public class HelloWorld {

    public static void main(String[] args) {
        // Prints "Hello, World" to the terminal window.
        System.out.println("Hello, World");
    }

}

After saving, check the character code of each file. WS000156.JPG WS000157.JPG You can see that the source is the same, but the character code is different.

② Take the difference of ①

WS000159.JPG You can see that there are differences in the binary data. ### ③ Compile each into a class file

UTF-Compile 16


javac -encoding UTF-16 HelloWorld.java

UTF-Compile 8


javac HelloWorld.java

④ Take the difference of the class file

WS000161.JPG There is no difference!

result

It turns out that the difference in the character code of the source is absorbed by the compiler and converted into a unique byte string.

The above experiment was done in Java, but after this it seems that the class file is converted to machine language by the JVM and the CPU executes it as a program. So, in this experiment, ** we found that the difference in the character code of the input we input is summarized when it is compiled into a class file. ** **

reference: Now ... I didn't know Java was "compiled" twice! (> <)

Summary

Originally, I was wondering, "The byte string should be different depending on the character code, but how does the hardware recognize it?", So I examined what I wrote above. However, this question may also have arisen because I didn't understand the role of the compiler properly. ** I realized the importance of studying the basics of the application again.

Recommended Posts

The application absorbs the difference in character code
Guess the character code in Java
Correct the character code in Java and read from the URL
[HTTP] Status code included in the HTTP response
If you have trouble with the character code problem in Myanmar (Burmese)
Ruby: Nokogiri automatically determines the character code of html read in binary mode
Java character code
Regarding the difference between the three Timeouts in Java's HttpClient
About the difference between classes and instances in Ruby
Include the source code generated by Doma in the source JAR
Calculate the difference between numbers in a Ruby array
Let's stop writing the spec path in the code in the comments
PATH does not pass ... The cause is the character code
Code that only displays the built-in camera in Processing
Differences in code when using the length system in Java
What is the difference between the responsibilities of the domain layer and the application layer in the onion architecture [DDD]
Implement CustomView in code
How to specify character code and line feed code in JAXB
Let's make a calculator application in Java ~ Display the application window
[Android, Java] Convenient method to calculate the difference in days
About the difference between "(double quotation)" and "single quotation" in Ruby
How to set character code and line feed code in Eclipse
Set the time zone in the JVM of your Azure application
Specify the character code of the source when building with Maven