When starting a programming language, it is common to print "Hello World". Especially when I have to use it for business, I will write Hello World at the speed of cockroach dash and try to move to the next stage.
But did I understand how this Hello World worked? This time, I'm going to go back to the beginning and review Hello World to the extent that I can't get confused by Dr. Retsu, "Kisama ridiculed Hello World."
Windows 10 java version "1.8.0_202" Java(TM) SE Runtime Environment (build 1.8.0_202-b08) Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode)
Create and run Hello World as usual. First, write the java code with a text editor.
HelloWorld.java
public class HelloWorld {
public static void main(String[] args) {
System.out.println("Hello World");
}
}
Create a class file by executing the javac command with the created java file as input. This time, add the "-g: none" option to prevent debug information from being generated.
javac -g:none HelloWorld.java
Execute the created "HelloWorld.class" with the java command.
>java HelloWorld
Hello World
"Hello World" was output. Unlike executable files created with c or c ++, the created HelloWorld.class will work on Windows, Mac, and Linux as long as it contains Java. No, Java is convenient.
**the end! !! Closed! !! that's all! !! Everyone disbanded! !! ** **
That was the end of it. Yes, I didn't have a chance to think about what this HelloWorld.class file is. This time it's a big deal, so let's check the contents of the created binary file.
The contents of HelloWorld.class are the following binaries.
To read this binary file, you need to read The Java® Virtual Machine Specification. https://docs.oracle.com/javase/specs/jvms/se8/html/index.html
I will briefly describe the specifications of the JVM used this time, but if you want to analyze the binary of the class file quickly, please [Skip](binary analysis of #helloworldclass).
class The structure of the file is as follows.
ClassFile {
u4 magic;
u2 minor_version;
u2 major_version;
u2 constant_pool_count;
cp_info constant_pool[constant_pool_count-1];
u2 access_flags;
u2 this_class;
u2 super_class;
u2 interfaces_count;
u2 interfaces[interfaces_count];
u2 fields_count;
field_info fields[fields_count];
u2 methods_count;
method_info methods[methods_count];
u2 attributes_count;
attribute_info attributes[attributes_count];
}
u1 represents 1 byte, u2 represents 2 bytes, and u4 represents 4 bytes of data. Others are structures, so I will explain each time.
magic is a magic number that identifies the class file format and is "0xCAFEBABE". The origin of this magic number is described in CAFEBABE: Java's Magic Word, so please read it. I use it on a date and it is drawn.
minor_version and major_version are the minor and major version numbers of this class file. The major and minor version numbers determine the version of the class file format.
constant_pool [] and constant_pool_count are struct tables and numbers that represent string constants, class and interface names, field names, and other constants. The valid range for the index to constant_pool is 1 to constant_pool_count-1.
access_flags is a combination of the following flags.
Flag name | value | Interpretation |
---|---|---|
ACC_PUBLIC | 0x0001 | public is declared. It can be accessed from outside the package. |
ACC_FINAL | 0x0010 | final is declared. Subclasses are not allowed. |
ACC_SUPER | 0x0020 | Treats superclass methods specially when called by the invokespecial instruction. |
ACC_INTERFACE | 0x0200 | It's an interface, not a class. |
ACC_ABSTRACT | 0x0400 | It has been declared abstract. Do not instantiate. |
ACC_SYNTHETIC | 0x1000 | Indicates that it was generated by the compiler. * For example, a class file created when a class is created in a class(ex. Hello$Test.class)Was given to |
ACC_ANNOTATION | 0x2000 | Declared as an annotation. |
ACC_ENUM | 0x4000 | Declared as an enum type. |
this_class must be a valid index value in constant_pool []. The data referenced in the index must be a CONSTANT_Class_info structure that holds the information of the class specified in the file.
super_class must be 0 or a valid index value in constant_pool []. The data referenced in the index must be a CONSTANT_Class_info structure that represents the direct superclass of the class defined in this class file.
interfaces_count and interfaces [] are an array of indexes into the CONSTANT_Class_info structure that represents the interfaces of the class defined in this class file. If there are no interfaces, interfaces_count will be 0 and interfaces [] will not exist.
fields_count, fields [] represent the fields of the class defined in this class file [field structure](https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html It is an array of # jvms-4.5). If there are no fields, fields_count will be 0 and fields [] will not exist.
methods_count and methods [] are an array of [method structures](#method structures) that represent the methods of the class defined in this class file. If there are no fields, methods_count will be 0 and methods [] will not exist.
attributes_count and attributes [] are an array of [attribute structure](#attribute structure) that represents the attribute information for the class defined in this class file.
constant_pool This structure is a 1-byte tag that determines what the structure will look like.
Constant Type | Value |
---|---|
CONSTANT_Class | 7 |
CONSTANT_Fieldref | 9 |
CONSTANT_Methodref | 10 |
CONSTANT_InterfaceMethodref | 11 |
CONSTANT_String | 8 |
CONSTANT_Integer | 3 |
CONSTANT_Float | 4 |
CONSTANT_Long | 5 |
CONSTANT_Double | 6 |
CONSTANT_NameAndType | 12 |
CONSTANT_Utf8 | 1 |
CONSTANT_MethodHandle | 15 |
CONSTANT_MethodType | 16 |
CONSTANT_InvokeDynamic | 18 |
The parts used this time are explained below, so refer to the following for other structures. https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.4-140
CONSTANT_Class Used to represent a class or interface.
CONSTANT_Class_info {
u1 tag;
u2 name_index;
}
7 representing CONSTANT_Class is stored in tag. The value of the name_index item will be the index to the [CONSTANT_Utf8_info structure](# constant_utf8) in the constant_pool table.
CONSTANT_Fieldref
CONSTANT_Fieldref_info {
u1 tag;
u2 class_index;
u2 name_and_type_index;
}
The tag contains 9 that represents the CONSTANT_Fieldref_info value. The value of class_index is the index to the CONSTANT_Class_info structure in the constant_pool table. The value of the name_and_type_index field is the index to the CONSTANT_NameAndType_info structure in the constant_pool table.
CONSTANT_Methodref
CONSTANT_Methodref_info {
u1 tag;
u2 class_index;
u2 name_and_type_index;
}
The tag contains 10 that represents the CONSTANT_Methodref value. The value of class_index is the index to the CONSTANT_Class_info structure in the constant_pool table. The value of the name_and_type_index field is the index to the CONSTANT_NameAndType_info structure in the constant_pool table.
CONSTANT_String
CONSTANT_String_info {
u1 tag;
u2 string_index;
}
8 that represents the CONSTANT_String value is stored in tag. The value of string_index is the index to the [CONSTANT_Utf8_info structure](# constant_utf8) in the constant_pool table.
CONSTANT_NameAndType Used to represent a field or method. However, it does not indicate the type of class or interface to which it belongs.
CONSTANT_NameAndType_info {
u1 tag;
u2 name_index;
u2 descriptor_index;
}
The tag contains 12 which represents the CONSTANT_NameAndType value. The value of the name_index field must be a valid index to the [CONSTANT_Utf8_info structure](# constant_utf8) in constant_pool.
CONSTANT_Utf8
CONSTANT_Utf8_info {
u1 tag;
u2 length;
u1 bytes[length];
}
The tag contains 1 that represents the CONSTANT_Utf8 value. length represents the number of bytes in the bytes array (not the length of the string) The byte array contains the bytes of the string. Also, the terminator is not included. See below for more information on this string. https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.4.7
method_info {
u2 access_flags;
u2 name_index;
u2 descriptor_index;
u2 attributes_count;
attribute_info attributes [attributes_count];
}
The value of the access_flags item is a combination of the permissions to this method and the flags used to indicate the properties of this method.
Flag name | value | Description |
---|---|---|
ACC_PUBLIC | 0x0001 | Declared public. It can be accessed from outside the package. |
ACC_PRIVATE | 0x0002 | Declared private. Only accessible within the definition class. |
ACC_PROTECTED | 0x0004 | Declared protected. You can access it within a subclass. |
ACC_STATIC | 0x0008 | Declared static. |
ACC_FINAL | 0x0010 | Declared final. It must not be overwritten. |
ACC_SYNCHRONIZED | 0x0020 | Declared synchronized. |
ACC_BRIDGE | 0x0040 | Used to indicate bridge methods generated by the compiler for the Java programming language.Java Generics - Bridge method?Please refer to. |
ACC_VARARGS | 0x0080 | It is declared with a variable number of arguments. |
ACC_NATIVE | 0x0100 | Declared native. It is implemented in languages other than Java. |
ACC_ABSTRACT | 0x0400 | Declared abstract. No implementation is provided. |
ACC_STRICT | 0x0800 | Declared strictfp. |
ACC_SYNTHETIC | 0x1000 | Indicates that it does not appear in the source code generated by the compiler. |
The value of name_index is the index to the [CONSTANT_Utf8_info structure](# constant_utf8) in the constant_pool table. Contains the method name or
The value of descriptor_index is the index to [CONSTANT_Utf8_info structure](# constant_utf8) in the constant_pool table. Contains the Method Descriptor (https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.3.3).
attributes_count and attributes [] are an array of [attribute structure](#attribute structure) that represents the attribute information for the class defined in this class file.
The shape of this structure changes depending on the attribute. The common format is as follows.
attribute_info {
u2 attribute_name_index;
u4 attribute_length;
u1 info [attribute_length];
}
The value of attribute_name_index is the index to the [CONSTANT_Utf8_info structure](# constant_utf8) in the constant_pool table. attribute_length represents the length of subsequent information in bytes. info is different for each attribute.
attribute | Location |
---|---|
SourceFile | ClassFile |
InnerClasses | ClassFile |
EnclosingMethod | ClassFile |
SourceDebugExtension | ClassFile |
BootstrapMethods | ClassFile |
ConstantValue | field_info |
Code | method_info |
Exceptions | method_info |
RuntimeVisibleParameterAnnotations, RuntimeInvisibleParameterAnnotations | method_info |
AnnotationDefault | method_info |
MethodParameters | method_info |
Synthetic | ClassFile, field_info, method_info |
Deprecated | ClassFile, field_info, method_info |
Signature | ClassFile, field_info, method_info |
RuntimeVisibleAnnotations, RuntimeInvisibleAnnotations | ClassFile, field_info, method_info |
LineNumberTable | Code |
LocalVariableTable | Code |
LocalVariableTypeTable | Code |
StackMapTable | Code |
RuntimeVisibleTypeAnnotations, RuntimeInvisibleTypeAnnotations | ClassFile, field_info, method_info, Code |
See below for items not covered here. https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.7
Code_attribute {
u2 attribute_name_index;
u4 attribute_length;
u2 max_stack;
u2 max_locals;
u4 code_length;
u1 code[code_length];
u2 exception_table_length;
{ u2 start_pc;
u2 end_pc;
u2 handler_pc;
u2 catch_type;
} exception_table[exception_table_length];
u2 attributes_count;
attribute_info attributes[attributes_count];
}
attribute_name_index and attribute_length are described in a common format. The character specified by attribute_name_index must be "Code".
The value of the max_stack item is the maximum depth of the Operator Stack for this method. It will be.
The value of the max_locals item is the Local Variables (https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-2.html#jvms-2.6.1) assigned when this method was called. ).
The value of the code_length item is the number of code [].
The code array shows the actual bytes of Java virtual machine code that implements the method. This code is described below. https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-6.html#jvms-6.5
exception_table_length stores the number of entries for exception_table.
exception_table represents exception information. The contents of each item of exception_table are as follows. -Start_pc, end_pc: Indicates the index value of the code array for which the exception handler is enabled. In the java code, it is the range surrounded by the try ward. -The handler_pc item is the index value of the code array started by the exception handler. In the java code, it is the range surrounded by the catch area. -Catch_type is 0 or a valid index to the constant_pool table, and that index is the CONSTANT_Class_info structure that represents the exception class.
attributes_count and attributes [] are an array of [attribute structure](#attribute structure) that represents the attribute information for the class defined in this class file.
The machine language brother will be able to read hexadecimal numbers using any binary editor, but honestly, it is difficult to read hexadecimal numbers in the era of Reiwa, so consider a binary editor that seems to be as easy to read as possible. I tried to.
This time, we will use BZ Editor. The reasons for adopting BZ Editor are as follows. ・ Can be used on Windows (Recently, it seems that it can be used on MacOS if built) ・ Structure can be displayed. ・ You can use Hino's words -Since the source is open to the public, it can be expanded if you feel like it. https://github.com/devil-tamachan/binaryeditorbz
If you want to consider other binary editors, check Wikipedia for a comparison table of binary editors. https://en.wikipedia.org/wiki/Comparison_of_hex_editors Among them, HxD seemed to be easy to use.
You can define the structure in Bz.def, which is in the same folder as the BZ Editor executable file. Note that only fixed size structures can be specified, so analysis cannot be performed perfectly.
Bz.def
struct ClassFile_1 {
BYTE magic[4];
short minor_version;
short majoir_version;
short constant_pool_count;
} class;
struct ClassFile_2 {
BYTE access_flags[2];
short this_class;
short super_class;
short interfaces_count;
} class;
struct CONSTANT_Class {
BYTE tag;
short index;
} class;
struct CONSTANT_Methodref_info {
BYTE tag;
short class_index;
short name_and_type_index;
} class;
struct CONSTANT_Fieldref {
BYTE tag;
short class_index;
short name_and_type_index;
} class;
struct CONSTANT_NameAndType_info {
BYTE tag;
short name_index;
short descriptor_index;
} class;
struct CONSTANT_String_info {
BYTE tag;
short string_index;
} class;
struct CONSTANT_Utf8 {
BYTE tag;
short length;
} class;
struct Code_attribute {
short attribute_name_index;
int attribute_length;
short max_stack;
short max_locals;
int code_length;
} class;
This BZ.def can be written like C language. See TYPESTR [NUM_MEMBERS] in the code below for the types you can use. https://github.com/devil-tamachan/binaryeditorbz/blob/master/Bz/BZFormVw.cpp
After starting BZEditor, check "View"> "Structure display" to display the child window for structure display.
By double-clicking an address, the structure information starting from that address is displayed.
Select Motorola when parsing class files.
Let's analyze the Class File from the beginning.
"0x CAFE BABE" is displayed for magic. minor version is 0 major_version is 52. constant_pool_count will be 26, and the next byte will be the entry for constant_pool.
constant_pool[1] Since the first byte is 0x0A = 10, the entry for this constant_pool will be CONSTANT_Methodref. class_index is 6 and name_and_type_index is 12. After looking at all the constant_pools, check what these indexes actually indicate.
constant_pool[2] Since the first byte is 0x09, the entry for this constant_pool will be CONSTANT_Fieldref. class_index is 13 and name_and_type_index is 14.
constant_pool[3]
Since the first byte is 0x08, the constant_pool entry will be CONSTANT_String. The index will be 15.
constant_pool[4]
Since the first byte is 0x0A = 10, the entry for this constant_pool will be CONSTANT_Methodref. class_index is 16 and name_and_type_index is 17.
constant_pool[5]
Since the first byte is 0x07, the entry for this constant_pool will be CONSTANT_Class. The index will be 18.
constant_pool[6]
Since the first byte is 0x07, the entry for this constant_pool will be CONSTANT_Class. The index will be 19.
constant_pool[7]
Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 6, and the 6 bytes after that store the characters "\ <init >".
constant_pool[8]
Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 3, and the character "() V" is stored in the subsequent 3 bytes.
constant_pool[9]
Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 4, and the character "Code" is stored in the subsequent 4 bytes.
constant_pool[10]
Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 4, and the 4 bytes after that store the character "main".
constant_pool[11]
Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 22, and the 22 bytes after that store the character "([Ljava / lang / String;) V".
constant_pool[12]
Since the first byte is 0x0C = 12, the entry for this constant_pool will be the CONSTANT_NameAndType_info structure. name_index is 7 and descriptor_index is 8.
constant_pool[13]
Since the first byte is 0x07, the entry for this constant_pool will be CONSTANT_Class. The index will be 20.
constant_pool[14]
Since the first byte is 0x0C = 12, the entry for this constant_pool will be the CONSTANT_NameAndType_info structure. name_index will be 21 and descriptor_index will be 22.
constant_pool[15]
Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 11, and the following 11 bytes store the characters "Hello World".
constant_pool[16]
Since the first byte is 0x07, the entry for this constant_pool will be CONSTANT_Class. The index will be 23.
constant_pool[17]
Since the first byte is 0x0C = 12, the entry for this constant_pool will be the CONSTANT_NameAndType_info structure. name_index is 24 and descriptor_index is 25.
constant_pool[18]
Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 10, and the 10 bytes after that store the characters "Hello World".
constant_pool[19]
Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 16, and the characters "java / lang / Object" are stored in the remaining 16 bytes.
constant_pool[20]
Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 16, and the following 16 bytes store the characters "java / lang / System".
constant_pool[21]
Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 3, and the character "out" is stored in the subsequent 3 bytes.
constant_pool[22] Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 21, and the subsequent 21 bytes store the characters "Ljava / io / PrintStream;".
constant_pool[23]
Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 19, and the subsequent 19 bytes store the characters "java / io / PrintStream".
constant_pool[24]
Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 7, and the 7 bytes after that store the character "println".
constant_pool[25]
Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 21, and the 21 bytes after that store the character "(Ljava / lang / String;) V".
The constant_pool can be summarized as follows.
No | Structure | Contents |
---|---|---|
1 | CONSTANT_Methodref | class_indexis6:HelloWorld、name_and_type_indexis12: |
2 | CONSTANT_Fieldref | class_index is 13:java/lang/System、name_and_type_index is 14: out:Ljava/io/PrintStream; |
3 | CONSTANT_String | index is 15:「Hello World」 |
4 | CONSTANT_Methodref | class_indexis16:java/io/PrintStream、name_and_type_indexis17:println:(Ljava/lang/String;)V |
5 | CONSTANT_Class | index is 18:「HelloWorld」 |
6 | CONSTANT_Class | index is 19:「java/lang/Object」 |
7 | CONSTANT_Utf8 | 「<init>Is the character string |
8 | CONSTANT_Utf8 | 「()The string "V" |
9 | CONSTANT_Utf8 | The string "Code" |
10 | CONSTANT_Utf8 | The string "main" |
11 | CONSTANT_Utf8 | 「([Ljava/lang/String;)The string "V" |
12 | CONSTANT_NameAndType_info | name_indexis7:「<init>」,descriptor_indexis8:「()V」。MethodDescriptorsSee |
13 | CONSTANT_Class | index is 20;「java/lang/System」 |
14 | CONSTANT_NameAndType_info | name_indexis21:「out」,descriptor_indexis22:「Ljava/io/PrintStream;」FieldDescriptorsSee |
15 | CONSTANT_Utf8 | The string "Hello World" |
16 | CONSTANT_Class | index is 23:「java/io/PrintStream」 |
17 | CONSTANT_NameAndType_info | name_indexis24:「println」,descriptor_indexis25:「(Ljava/lang/String;)V」MethodDescriptorsSee |
18 | CONSTANT_Utf8 | The string "Hello World" |
19 | CONSTANT_Utf8 | 「java/lang/The string "Object" |
20 | CONSTANT_Utf8 | 「java/lang/The string "System" |
21 | CONSTANT_Utf8 | The string "out" |
22 | CONSTANT_Utf8 | 「Ljava/io/PrintStream;Is the character string |
23 | CONSTANT_Utf8 | 「java/io/The string "Print Stream" |
24 | CONSTANT_Utf8 | The string "println" |
25 | CONSTANT_Utf8 | 「(Ljava/lang/String;)The string "V" |
-Access_flags is 0x0021. That is, ACC_SUPER (0x20) and ACC_PUBLIC (0x01).
-Since this_class is constant_pool [5], it is a HelloWorld class.
-Since super_class is constant_pool [6], it is a java / lang / Object class.
-Interfaces_count is 0, and the following interfaces [] does not exist.
Since fields_count is 0, there are no files to follow.
methods_count~methods[]
Since methods_count is 0x0002, there are 2 methods, followed by the method_info structure.
method_info[0]
-Access_flags is 0x0001. That is, ACC_PUBLIC (0x01).
-Name_index will be "\ <init >" of constant_pool [7]. This is an implicit constructor created during Java compilation.
-Description_index will be "() V" of constant_pool [8].
-Attributes_count is 1, and there is one attributes structure.
method_info[0].attributes[0]
Since attribute_name_index is the "Code" of constant_pool [9], this structure will be the [Code_attribute structure](#code_attribute structure).
The attribute_length will be 17 bytes and will determine the size of this structure.
max_stack is 1 and max_locals is 1.
The code_length is 5, which means that the next "0 x 2A B7 00 01 B1" is a bytecode. 0x2a becomes aload_0. This instruction pushes this onto the operand stack.
0xb7 becomes invokesplecial.
This instruction calls the method with the following 2 bytes as the index of constant_pool.
In this case, it is "0x00 01", so constant_pool [1], "java / lang / Object."
0xb1 becomes return.
Both exception_table_length and attributes_count are 0.
method_info[1]
-Access_flags is 0x0009. That is, ACC_PUBLIC (0x01) and ACC_STATIC (0x08).
-Name_index will be "main" of constant_pool [10].
-The description_index will be "([Ljava / lang / String;) V" of constant_pool [11].
-Attributes_count is 1, and there is one attributes structure.
method_info[1].attributes[0]
Since attribute_name_index is the "Code" of constant_pool [9], this structure will be the [Code_attribute structure](#code_attribute structure).
The attribute_length will be 21 bytes and will determine the size of this structure.
max_stack is 2 and max_locals is 1.
The code_length is 9, which means that the next "0 x B2 00 02 12 03 B6 00 04 B1" is a bytecode.
0xb2 is getstatic. This instruction retrieves a field from a static class with the following 2 bytes as the index of constant_pool. In this case, it is "0x00 02", so get the "out: Ljava / io / PrintStream" of the "java / lang / System" class, which is constant_pool [2]. The obtained result is loaded on the operand stack.
0x12 is ldc. This instruction uses the following 1 byte as the index of constant_pool and pushes its contents onto the operand stack. In this case, it is "0x03", so the character string "Hello World" of constant_pool [3] is loaded on the operand stack.
0xb6 is the invoke virtual. This instruction executes the method using the following 2 bytes as the index of constant_pool. In this case, it is "0x00 04", so execute println: (Ljava / lang / String;) V of java / io / PrintStream class which is constant_pool [4].
0xb1 becomes return.
Both exception_table_length and attributes_count are 0.
Since attributes_count is 0, there is no attributes data.
In this way, you can analyze the class file with the JVM specifications and the binary editor. ** However, you can analyze with javap command without using such a troublesome binary editor. ** **
>javap -v HelloWorld
Classfile /C:/XXXXXXX/HelloWorld.class
Last modified 2019/06/09; size 340 bytes
MD5 checksum 3ee6d0a4b44197baaeb0cec79a0b73d3
public class HelloWorld
minor version: 0
major version: 52
flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
#1 = Methodref #6.#12 // java/lang/Object."<init>":()V
#2 = Fieldref #13.#14 // java/lang/System.out:Ljava/io/PrintStream;
#3 = String #15 // Hello World
#4 = Methodref #16.#17 // java/io/PrintStream.println:(Ljava/lang/String;)V
#5 = Class #18 // HelloWorld
#6 = Class #19 // java/lang/Object
#7 = Utf8 <init>
#8 = Utf8 ()V
#9 = Utf8 Code
#10 = Utf8 main
#11 = Utf8 ([Ljava/lang/String;)V
#12 = NameAndType #7:#8 // "<init>":()V
#13 = Class #20 // java/lang/System
#14 = NameAndType #21:#22 // out:Ljava/io/PrintStream;
#15 = Utf8 Hello World
#16 = Class #23 // java/io/PrintStream
#17 = NameAndType #24:#25 // println:(Ljava/lang/String;)V
#18 = Utf8 HelloWorld
#19 = Utf8 java/lang/Object
#20 = Utf8 java/lang/System
#21 = Utf8 out
#22 = Utf8 Ljava/io/PrintStream;
#23 = Utf8 java/io/PrintStream
#24 = Utf8 println
#25 = Utf8 (Ljava/lang/String;)V
{
public HelloWorld();
descriptor: ()V
flags: ACC_PUBLIC
Code:
stack=1, locals=1, args_size=1
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=2, locals=1, args_size=1
0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #3 // String Hello World
5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
8: return
}
It's easier than reading the binary. Well, this has deepened my understanding of Hello World. I'm happy.
Let's dig a little deeper into how this HelloWorld.class works.
It is often said that Java may be compiled multiple times, but what does that mean? This was described by Tobias Hartmann [The Java HotSpot VM](https://www.ethz.ch/content/dam/ethz/special-interest/infk/inst-cs/lst-dam/documents/Education/ Classes / Spring2018 / 210_Compiler_Design / Slides / 2018-Compiler-Design-Guest-Talk.pdf).
Bytecode created as shown above can either be executed with machine code compiled with C1 or C2, or with an interpreter. Also, in Java8, what was running on the interpreter may change to the one compiled on C1 from the middle, or it may be compiled step by step as interpreter → C1 → C2.
** I mean, I had no idea how Hello World worked ... **
Is there a way to tell if the executed bytecode was executed in the interpreter or in the compiled machine code? You can find out by using "-XX: + PrintCompilation" when running java.
>java -XX:+PrintCompilation HelloWorld
73 1 3 java.lang.String::hashCode (55 bytes)
74 2 3 java.lang.String::equals (81 bytes)
75 4 n 0 java.lang.System::arraycopy (native) (static)
76 3 4 java.lang.String::charAt (29 bytes)
76 5 3 java.lang.Object::<init> (1 bytes)
78 6 4 sun.misc.ASCIICaseInsensitiveComparator::toLower (16 bytes)
78 7 4 sun.misc.ASCIICaseInsensitiveComparator::isUpper (18 bytes)
79 8 4 java.lang.String::length (6 bytes)
79 9 3 java.lang.AbstractStringBuilder::ensureCapacityInternal (27 bytes)
80 10 3 java.lang.Character::toLowerCase (9 bytes)
80 11 3 java.lang.CharacterData::of (120 bytes)
81 15 1 java.lang.Object::<init> (1 bytes)
81 5 3 java.lang.Object::<init> (1 bytes) made not entrant
81 12 3 java.lang.CharacterDataLatin1::toLowerCase (39 bytes)
82 13 3 java.lang.CharacterDataLatin1::getProperties (11 bytes)
82 17 3 java.io.WinNTFileSystem::isSlash (18 bytes)
84 16 3 java.lang.AbstractStringBuilder::append (29 bytes)
84 18 s 3 java.lang.StringBuffer::append (13 bytes)
85 14 3 java.lang.Math::min (11 bytes)
86 19 3 java.lang.StringBuilder::append (8 bytes)
88 20 3 java.lang.String::getChars (62 bytes)
90 22 3 java.lang.String::indexOf (70 bytes)
91 21 3 java.util.Arrays::copyOfRange (63 bytes)
92 23 3 java.lang.System::getSecurityManager (4 bytes)
Hello World
You can see which method was compiled and executed by using PrintCompilation. Details of this output can be found in StackOverflow, but since it's a big deal, I'll take a look at the source code of the OpenSDK. ..
The Java 8 source code can be obtained from the following. https://download.java.net/openjdk/jdk8u40/ri/openjdk-8u40-src-b25-10_feb_2015.zip
Let's take a look at the code below, which is thought to create the content that is output when Print Compilation is added in this code.
openjdk\hotspot\src\share\vm\compiler\compileBroker.cpp
// ------------------------------------------------------------------
// CompileTask::print_compilation_impl
void CompileTask::print_compilation_impl(outputStream* st, Method* method, int compile_id, int comp_level,
bool is_osr_method, int osr_bci, bool is_blocking,
const char* msg, bool short_form) {
if (!short_form) {
st->print("%7d ", (int) st->time_stamp().milliseconds()); // print timestamp
}
st->print("%4d ", compile_id); // print compilation number
// For unloaded methods the transition to zombie occurs after the
// method is cleared so it's impossible to report accurate
// information for that case.
bool is_synchronized = false;
bool has_exception_handler = false;
bool is_native = false;
if (method != NULL) {
is_synchronized = method->is_synchronized();
has_exception_handler = method->has_exception_handler();
is_native = method->is_native();
}
// method attributes
const char compile_type = is_osr_method ? '%' : ' ';
const char sync_char = is_synchronized ? 's' : ' ';
const char exception_char = has_exception_handler ? '!' : ' ';
const char blocking_char = is_blocking ? 'b' : ' ';
const char native_char = is_native ? 'n' : ' ';
// print method attributes
st->print("%c%c%c%c%c ", compile_type, sync_char, exception_char, blocking_char, native_char);
if (TieredCompilation) {
if (comp_level != -1) st->print("%d ", comp_level);
else st->print("- ");
}
st->print(" "); // more indent
if (method == NULL) {
st->print("(method)");
} else {
method->print_short_name(st);
if (is_osr_method) {
st->print(" @ %d", osr_bci);
}
if (method->is_native())
st->print(" (native)");
else
st->print(" (%d bytes)", method->code_size());
}
if (msg != NULL) {
st->print(" %s", msg);
}
if (!short_form) {
st->cr();
}
}
The type stamp is output in the first column.
The second column is compilation_id and method_attributes. cocmpilation_id is a 4-digit number. method_attributes is a combination of flags and is displayed as follows.
letter | conditions |
---|---|
% | For OCR method.In case of InvalidOSREntryBci when MethodCompilation of enum type is defined and there are InvocationEntryBci and InvalidOSREntryBci |
s | In case of synchronized |
! | exception_If you have a handler |
b | In case of blocking |
n | For native code |
The third column shows the compile level when Tiered Compilation is ON. This Tiered Compilation can be controlled with the -XX: -Tiered Compilation or + XX: -Tiered Compilation options, but for Java 8 the default is ON. The compile level is as follows.
level | Contents |
---|---|
0 | interpreter |
1 | C1 with full optimization (no profiling) |
2 | C1 with limited profiling |
3 | C1 with full profiling |
4 | C2 |
In other words, the same C1 is divided into three stages.
The method name is output in the 4th column.
Now let's look at the output of the first -XX: + PrintCompilation. It doesn't include the main of the HelloWorld class, so you can see that the code there is running in the interpreter.
Machine code created by compiling with C1 and C2 is not output to a file, it just exists in memory. There are several steps required to confirm this.
First, get hsdis-amd64.dll that can be disassembled. For Windows, you can download it from: https://sourceforge.net/projects/fcml/files/
After downloading the DLL, pass it through the path.
Then execute the following command.
java -XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+PrintAssembly -XX:+LogCompilation HelloWorld
A log file is created in the current directory, and you can see what kind of machine code you created.
Output example
Decoding compiled method 0x0000000002d00750:
Code:
RIP: 0x2d008a0 Code size: 0x000001f0
[Entry Point]
[Constants]
# {method} {0x00000000192f4fc0} 'hashCode' '()I' in 'java/lang/String'
# [sp+0x40] (sp of caller)
0x0000000002d008a0: mov r10d,dword ptr [rdx+8h]
0x0000000002d008a4: shl r10,3h
0x0000000002d008a8: cmp r10,rax
0x0000000002d008ab: jne 2c35f60h ; {runtime_call}
0x0000000002d008b1: nop word ptr [rax+rax+0h]
0x0000000002d008bc: nop
[Verified Entry Point]
0x0000000002d008c0: mov dword ptr [rsp+0ffffffffffffa000h],eax
0x0000000002d008c7: push rbp
0x0000000002d008c8: sub rsp,30h
0x0000000002d008cc: mov rax,193e7ac8h
0x0000000002d008d6: mov esi,dword ptr [rax+8h]
0x0000000002d008d9: add esi,8h
0x0000000002d008dc: mov dword ptr [rax+8h],esi
//Abbreviation
The log file that outputs the machine code has a large amount of information output, and it will be difficult to find the desired information. In this case, you may want to browse on JitWatch. https://github.com/AdoptOpenJDK/jitwatch/
Please refer to the following for detailed usage.
** Watch JIT compilation on JIT Watch! ** ** https://www.sakatakoichi.com/entry/2014/12/04/202747
By now, we know that HelloWorld :: main is running on the interpreter. Then, where and how are the following commands such as getstatic ~ return specifically processed?
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=2, locals=1, args_size=1
0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #3 // String Hello World
5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
8: return
}
The research team, who was hunting for the OpenSDK source code to unravel this mystery, finally found the relevant part. I found that instructions such as getstatic and ldc are interpreted and executed by runWithChecks () / run () in bytecodeInterpreter.cpp.
hotspot\src\share\vm\interpreter\bytecodeInterpreter.cpp
/*
* BytecodeInterpreter::run(interpreterState istate)
* BytecodeInterpreter::runWithChecks(interpreterState istate)
*
* The real deal. This is where byte codes actually get interpreted.
* Basically it's a big while loop that iterates until we return from
* the method passed in.
*
* The runWithChecks is used if JVMTI is enabled.
*
*/
#if defined(VM_JVMTI)
void
BytecodeInterpreter::runWithChecks(interpreterState istate) {
#else
void
BytecodeInterpreter::run(interpreterState istate) {
#endif
//Abbreviation
#ifndef USELABELS
while (1)
#endif
{
#ifndef PREFETCH_OPCCODE
opcode = *pc;
#endif
// Seems like this happens twice per opcode. At worst this is only
// need at entry to the loop.
// DEBUGGER_SINGLE_STEP_NOTIFY();
/* Using this labels avoids double breakpoints when quickening and
* when returing from transition frames.
*/
opcode_switch:
assert(istate == orig, "Corrupted istate");
/* QQQ Hmm this has knowledge of direction, ought to be a stack method */
assert(topOfStack >= istate->stack_limit(), "Stack overrun");
assert(topOfStack < istate->stack_base(), "Stack underrun");
#ifdef USELABELS
DISPATCH(opcode);
#else
switch (opcode)
#endif
{
CASE(_nop):
UPDATE_PC_AND_CONTINUE(1);
//Abbreviation
}
}
}
}
This loops with while until all the bytecodes in the method are executed, and is branched and executed in the CASE section according to the instruction code. For example, getstatic has the following implementation.
getstatic
CASE(_getfield):
CASE(_getstatic):
{
u2 index;
ConstantPoolCacheEntry* cache;
//Note: Current bytecode location pc+Get 1 to 2 bytes of data and store it in index
index = Bytes::get_native_u2(pc+1);
// QQQ Need to make this as inlined as possible. Probably need to
// split all the bytecode cases out so c++ compiler has a chance
// for constant prop to fold everything possible away.
//Note: constatnt_Specify index from table and take a value.
cache = cp->entry_at(index);
if (!cache->is_resolved((Bytecodes::Code)opcode)) {
CALL_VM(InterpreterRuntime::resolve_get_put(THREAD, (Bytecodes::Code)opcode),
handle_exception);
cache = cp->entry_at(index);
}
#ifdef VM_JVMTI
if (_jvmti_interp_events) {
int *count_addr;
oop obj;
// Check to see if a field modification watch has been set
// before we take the time to call into the VM.
count_addr = (int *)JvmtiExport::get_field_access_count_addr();
if ( *count_addr > 0 ) {
if ((Bytecodes::Code)opcode == Bytecodes::_getstatic) {
obj = (oop)NULL;
} else {
obj = (oop) STACK_OBJECT(-1);
VERIFY_OOP(obj);
}
CALL_VM(InterpreterRuntime::post_field_access(THREAD,
obj,
cache),
handle_exception);
}
}
#endif /* VM_JVMTI */
oop obj;
if ((Bytecodes::Code)opcode == Bytecodes::_getstatic) {
//Note: constant_table[2]Get the class information of and put it in obj
Klass* k = cache->f1_as_klass();
obj = k->java_mirror();
MORE_STACK(1); // Assume single slot push
} else {
obj = (oop) STACK_OBJECT(-1);
CHECK_NULL(obj);
}
//
// Now store the result on the stack
//
TosState tos_type = cache->flag_state();
//Note: constant_table[2]Get field information of field_Put in offset
int field_offset = cache->f2_as_index();
if (cache->is_volatile()) {
if (support_IRIW_for_not_multiple_copy_atomic_cpu) {
OrderAccess::fence();
}
if (tos_type == atos) {
VERIFY_OOP(obj->obj_field_acquire(field_offset));
SET_STACK_OBJECT(obj->obj_field_acquire(field_offset), -1);
} else if (tos_type == itos) {
SET_STACK_INT(obj->int_field_acquire(field_offset), -1);
} else if (tos_type == ltos) {
SET_STACK_LONG(obj->long_field_acquire(field_offset), 0);
MORE_STACK(1);
} else if (tos_type == btos) {
SET_STACK_INT(obj->byte_field_acquire(field_offset), -1);
} else if (tos_type == ctos) {
SET_STACK_INT(obj->char_field_acquire(field_offset), -1);
} else if (tos_type == stos) {
SET_STACK_INT(obj->short_field_acquire(field_offset), -1);
} else if (tos_type == ftos) {
SET_STACK_FLOAT(obj->float_field_acquire(field_offset), -1);
} else {
SET_STACK_DOUBLE(obj->double_field_acquire(field_offset), 0);
MORE_STACK(1);
}
} else {
if (tos_type == atos) {
//Note: constant_table[2]Gets the field of the class of and stores the result as an object on the stack.
VERIFY_OOP(obj->obj_field(field_offset));
SET_STACK_OBJECT(obj->obj_field(field_offset), -1);
} else if (tos_type == itos) {
SET_STACK_INT(obj->int_field(field_offset), -1);
} else if (tos_type == ltos) {
SET_STACK_LONG(obj->long_field(field_offset), 0);
MORE_STACK(1);
} else if (tos_type == btos) {
SET_STACK_INT(obj->byte_field(field_offset), -1);
} else if (tos_type == ctos) {
SET_STACK_INT(obj->char_field(field_offset), -1);
} else if (tos_type == stos) {
SET_STACK_INT(obj->short_field(field_offset), -1);
} else if (tos_type == ftos) {
SET_STACK_FLOAT(obj->float_field(field_offset), -1);
} else {
SET_STACK_DOUBLE(obj->double_field(field_offset), 0);
MORE_STACK(1);
}
}
//Execute the instruction 3 bytes ahead of getstatic.
UPDATE_PC_AND_CONTINUE(3);
}
// Have to do this dispatch this way in C++ because otherwise gcc complains about crossing an
// initialization (which is is the initialization of the table pointer...)
#define DISPATCH(opcode) goto *(void*)dispatch_table[opcode]
//Abbreviation
#define UPDATE_PC_AND_CONTINUE(opsize) { \
pc += opsize; opcode = *pc; \
DO_UPDATE_INSTRUCTION_COUNT(opcode); \
DEBUGGER_SINGLE_STEP_NOTIFY(); \
DISPATCH(opcode); \
}
DISPACH is executed when the next instruction is executed. This is a goto statement, which executes the next instruction by jumping to the instruction label.
As you can see, with regard to interrupters, if you look at this code as a starting point, you can get an idea of what kind of processing you are doing. ~~ At this point, there are over 30,000 characters, so it's hard to put all the analyzed information together ~~
This time, I went back to the beginning and saw how Hello World works.
I say, "Seeing is believing, seeing is believing," but I'm sorry, I licked Hello World. It was very annoying.
Demystifying the JVM: Interpretation, JIT and AOT Compilation https://metebalci.com/blog/demystifying-the-jvm-interpretation-jit-and-aot-compilation/#disqus_thread
DEMYSTIFYING THE JVM: JVM VARIANTS, CPPINTERPRETER AND TEMPLATEINTERPRETER https://metebalci.com/blog/demystifying-the-jvm-jvm-variants-cppinterpreter-and-templateinterpreter/#disqus_thread
** Survey of Java JIT compilation with JIT Watch ** https://www.oracle.com/webfolder/technetwork/jp/javamagazine/Java-MA15-Architect-newland.pdf
** Watch JIT compilation on JIT Watch! ** ** https://www.sakatakoichi.com/entry/2014/12/04/202747
** I tried to output [Java]
Recommended Posts