Returning to the beginning, Java-Kisama mocked Hello World-

Kisama mocked Hello World

When starting a programming language, it is common to print "Hello World". Especially when I have to use it for business, I will write Hello World at the speed of cockroach dash and try to move to the next stage.

But did I understand how this Hello World worked? This time, I'm going to go back to the beginning and review Hello World to the extent that I can't get confused by Dr. Retsu, "Kisama ridiculed Hello World."

Cockroach dash-like Hello World

environment

Windows 10 java version "1.8.0_202" Java(TM) SE Runtime Environment (build 1.8.0_202-b08) Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode)

Create-execute Helloworld

Create and run Hello World as usual. First, write the java code with a text editor.

HelloWorld.java


public class HelloWorld {
	public static void main(String[] args) {
		System.out.println("Hello World");
	}
}

Create a class file by executing the javac command with the created java file as input. This time, add the "-g: none" option to prevent debug information from being generated.

javac -g:none HelloWorld.java

Execute the created "HelloWorld.class" with the java command.

>java HelloWorld
Hello World

"Hello World" was output. Unlike executable files created with c or c ++, the created HelloWorld.class will work on Windows, Mac, and Linux as long as it contains Java. No, Java is convenient.

**the end! !! Closed! !! that's all! !! Everyone disbanded! !! ** **

There was a time when I was thinking like that

That was the end of it. Yes, I didn't have a chance to think about what this HelloWorld.class file is. This time it's a big deal, so let's check the contents of the created binary file.

The contents of HelloWorld.class are the following binaries.

image.png

To read this binary file, you need to read The Java® Virtual Machine Specification. https://docs.oracle.com/javase/specs/jvms/se8/html/index.html

JVM specifications used this time

I will briefly describe the specifications of the JVM used this time, but if you want to analyze the binary of the class file quickly, please [Skip](binary analysis of #helloworldclass).

class The structure of the file is as follows.

ClassFile {
    u4             magic;
    u2             minor_version;
    u2             major_version;
    u2             constant_pool_count;
    cp_info        constant_pool[constant_pool_count-1];
    u2             access_flags;
    u2             this_class;
    u2             super_class;
    u2             interfaces_count;
    u2             interfaces[interfaces_count];
    u2             fields_count;
    field_info     fields[fields_count];
    u2             methods_count;
    method_info    methods[methods_count];
    u2             attributes_count;
    attribute_info attributes[attributes_count];
}

u1 represents 1 byte, u2 represents 2 bytes, and u4 represents 4 bytes of data. Others are structures, so I will explain each time.

magic is a magic number that identifies the class file format and is "0xCAFEBABE". The origin of this magic number is described in CAFEBABE: Java's Magic Word, so please read it. I use it on a date and it is drawn.

minor_version and major_version are the minor and major version numbers of this class file. The major and minor version numbers determine the version of the class file format.

constant_pool [] and constant_pool_count are struct tables and numbers that represent string constants, class and interface names, field names, and other constants. The valid range for the index to constant_pool is 1 to constant_pool_count-1.

access_flags is a combination of the following flags.

Flag name value Interpretation
ACC_PUBLIC 0x0001 public is declared. It can be accessed from outside the package.
ACC_FINAL 0x0010 final is declared. Subclasses are not allowed.
ACC_SUPER 0x0020 Treats superclass methods specially when called by the invokespecial instruction.
ACC_INTERFACE 0x0200 It's an interface, not a class.
ACC_ABSTRACT 0x0400 It has been declared abstract. Do not instantiate.
ACC_SYNTHETIC 0x1000 Indicates that it was generated by the compiler. * For example, a class file created when a class is created in a class(ex. Hello$Test.class)Was given to
ACC_ANNOTATION 0x2000 Declared as an annotation.
ACC_ENUM 0x4000 Declared as an enum type.

this_class must be a valid index value in constant_pool []. The data referenced in the index must be a CONSTANT_Class_info structure that holds the information of the class specified in the file.

super_class must be 0 or a valid index value in constant_pool []. The data referenced in the index must be a CONSTANT_Class_info structure that represents the direct superclass of the class defined in this class file.

interfaces_count and interfaces [] are an array of indexes into the CONSTANT_Class_info structure that represents the interfaces of the class defined in this class file. If there are no interfaces, interfaces_count will be 0 and interfaces [] will not exist.

fields_count, fields [] represent the fields of the class defined in this class file [field structure](https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html It is an array of # jvms-4.5). If there are no fields, fields_count will be 0 and fields [] will not exist.

methods_count and methods [] are an array of [method structures](#method structures) that represent the methods of the class defined in this class file. If there are no fields, methods_count will be 0 and methods [] will not exist.

attributes_count and attributes [] are an array of [attribute structure](#attribute structure) that represents the attribute information for the class defined in this class file.

constant_pool This structure is a 1-byte tag that determines what the structure will look like.

Constant Type Value
CONSTANT_Class 7
CONSTANT_Fieldref 9
CONSTANT_Methodref 10
CONSTANT_InterfaceMethodref 11
CONSTANT_String 8
CONSTANT_Integer 3
CONSTANT_Float 4
CONSTANT_Long 5
CONSTANT_Double 6
CONSTANT_NameAndType 12
CONSTANT_Utf8 1
CONSTANT_MethodHandle 15
CONSTANT_MethodType 16
CONSTANT_InvokeDynamic 18

The parts used this time are explained below, so refer to the following for other structures. https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.4-140

CONSTANT_Class Used to represent a class or interface.

CONSTANT_Class_info {
    u1 tag;
    u2 name_index;
}

7 representing CONSTANT_Class is stored in tag. The value of the name_index item will be the index to the [CONSTANT_Utf8_info structure](# constant_utf8) in the constant_pool table.

CONSTANT_Fieldref

CONSTANT_Fieldref_info {
    u1 tag;
    u2 class_index;
    u2 name_and_type_index;
}

The tag contains 9 that represents the CONSTANT_Fieldref_info value. The value of class_index is the index to the CONSTANT_Class_info structure in the constant_pool table. The value of the name_and_type_index field is the index to the CONSTANT_NameAndType_info structure in the constant_pool table.

CONSTANT_Methodref

CONSTANT_Methodref_info {
    u1 tag;
    u2 class_index;
    u2 name_and_type_index;
}

The tag contains 10 that represents the CONSTANT_Methodref value. The value of class_index is the index to the CONSTANT_Class_info structure in the constant_pool table. The value of the name_and_type_index field is the index to the CONSTANT_NameAndType_info structure in the constant_pool table.

CONSTANT_String

CONSTANT_String_info {
    u1 tag;
    u2 string_index;
}

8 that represents the CONSTANT_String value is stored in tag. The value of string_index is the index to the [CONSTANT_Utf8_info structure](# constant_utf8) in the constant_pool table.

CONSTANT_NameAndType Used to represent a field or method. However, it does not indicate the type of class or interface to which it belongs.

CONSTANT_NameAndType_info {
    u1 tag;
    u2 name_index;
    u2 descriptor_index;
}

The tag contains 12 which represents the CONSTANT_NameAndType value. The value of the name_index field must be a valid index to the [CONSTANT_Utf8_info structure](# constant_utf8) in constant_pool.

CONSTANT_Utf8

CONSTANT_Utf8_info {
    u1 tag;
    u2 length;
    u1 bytes[length];
}

The tag contains 1 that represents the CONSTANT_Utf8 value. length represents the number of bytes in the bytes array (not the length of the string) The byte array contains the bytes of the string. Also, the terminator is not included. See below for more information on this string. https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.4.7

method structure

method_info { 
    u2 access_flags; 
    u2 name_index; 
    u2 descriptor_index; 
    u2 attributes_count; 
    attribute_info attributes [attributes_count]; 
}

The value of the access_flags item is a combination of the permissions to this method and the flags used to indicate the properties of this method.

Flag name value Description
ACC_PUBLIC 0x0001 Declared public. It can be accessed from outside the package.
ACC_PRIVATE 0x0002 Declared private. Only accessible within the definition class.
ACC_PROTECTED 0x0004 Declared protected. You can access it within a subclass.
ACC_STATIC 0x0008 Declared static.
ACC_FINAL 0x0010 Declared final. It must not be overwritten.
ACC_SYNCHRONIZED 0x0020 Declared synchronized.
ACC_BRIDGE 0x0040 Used to indicate bridge methods generated by the compiler for the Java programming language.Java Generics - Bridge method?Please refer to.
ACC_VARARGS 0x0080 It is declared with a variable number of arguments.
ACC_NATIVE 0x0100 Declared native. It is implemented in languages other than Java.
ACC_ABSTRACT 0x0400 Declared abstract. No implementation is provided.
ACC_STRICT 0x0800 Declared strictfp.
ACC_SYNTHETIC 0x1000 Indicates that it does not appear in the source code generated by the compiler.

The value of name_index is the index to the [CONSTANT_Utf8_info structure](# constant_utf8) in the constant_pool table. Contains the method name or or .

The value of descriptor_index is the index to [CONSTANT_Utf8_info structure](# constant_utf8) in the constant_pool table. Contains the Method Descriptor (https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.3.3).

attributes_count and attributes [] are an array of [attribute structure](#attribute structure) that represents the attribute information for the class defined in this class file.

attribute structure

The shape of this structure changes depending on the attribute. The common format is as follows.

attribute_info { 
    u2 attribute_name_index; 
    u4 attribute_length; 
    u1 info [attribute_length]; 
}

The value of attribute_name_index is the index to the [CONSTANT_Utf8_info structure](# constant_utf8) in the constant_pool table. attribute_length represents the length of subsequent information in bytes. info is different for each attribute.

attribute Location
SourceFile ClassFile
InnerClasses ClassFile
EnclosingMethod ClassFile
SourceDebugExtension ClassFile
BootstrapMethods ClassFile
ConstantValue field_info
Code method_info
Exceptions method_info
RuntimeVisibleParameterAnnotations, RuntimeInvisibleParameterAnnotations method_info
AnnotationDefault method_info
MethodParameters method_info
Synthetic ClassFile, field_info, method_info
Deprecated ClassFile, field_info, method_info
Signature ClassFile, field_info, method_info
RuntimeVisibleAnnotations, RuntimeInvisibleAnnotations ClassFile, field_info, method_info
LineNumberTable Code
LocalVariableTable Code
LocalVariableTypeTable Code
StackMapTable Code
RuntimeVisibleTypeAnnotations, RuntimeInvisibleTypeAnnotations ClassFile, field_info, method_info, Code

See below for items not covered here. https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.7

Code attribute

Code_attribute {
    u2 attribute_name_index;
    u4 attribute_length;
    u2 max_stack;
    u2 max_locals;
    u4 code_length;
    u1 code[code_length];
    u2 exception_table_length;
    {   u2 start_pc;
        u2 end_pc;
        u2 handler_pc;
        u2 catch_type;
    } exception_table[exception_table_length];
    u2 attributes_count;
    attribute_info attributes[attributes_count];
}

attribute_name_index and attribute_length are described in a common format. The character specified by attribute_name_index must be "Code".

The value of the max_stack item is the maximum depth of the Operator Stack for this method. It will be.

The value of the max_locals item is the Local Variables (https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-2.html#jvms-2.6.1) assigned when this method was called. ).

The value of the code_length item is the number of code [].

The code array shows the actual bytes of Java virtual machine code that implements the method. This code is described below. https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-6.html#jvms-6.5

exception_table_length stores the number of entries for exception_table.

exception_table represents exception information. The contents of each item of exception_table are as follows. -Start_pc, end_pc: Indicates the index value of the code array for which the exception handler is enabled. In the java code, it is the range surrounded by the try ward. -The handler_pc item is the index value of the code array started by the exception handler. In the java code, it is the range surrounded by the catch area. -Catch_type is 0 or a valid index to the constant_pool table, and that index is the CONSTANT_Class_info structure that represents the exception class.

attributes_count and attributes [] are an array of [attribute structure](#attribute structure) that represents the attribute information for the class defined in this class file.

Binary analysis of HelloWorld.class

Binary editor selection and settings

The machine language brother will be able to read hexadecimal numbers using any binary editor, but honestly, it is difficult to read hexadecimal numbers in the era of Reiwa, so consider a binary editor that seems to be as easy to read as possible. I tried to.

This time, we will use BZ Editor. The reasons for adopting BZ Editor are as follows. ・ Can be used on Windows (Recently, it seems that it can be used on MacOS if built) ・ Structure can be displayed. ・ You can use Hino's words -Since the source is open to the public, it can be expanded if you feel like it.  https://github.com/devil-tamachan/binaryeditorbz

If you want to consider other binary editors, check Wikipedia for a comparison table of binary editors. https://en.wikipedia.org/wiki/Comparison_of_hex_editors Among them, HxD seemed to be easy to use.

BZ Editor settings

Structure definition

You can define the structure in Bz.def, which is in the same folder as the BZ Editor executable file. Note that only fixed size structures can be specified, so analysis cannot be performed perfectly.

Bz.def


struct ClassFile_1 {
       BYTE magic[4];
       short minor_version;
       short majoir_version;
       short constant_pool_count;
} class;
struct ClassFile_2 {
       BYTE access_flags[2];
       short this_class;
       short super_class;
       short interfaces_count;
} class;

struct CONSTANT_Class {
       BYTE tag;
       short index;
} class;

struct CONSTANT_Methodref_info {
       BYTE tag;
       short class_index;
       short name_and_type_index;
} class;

struct CONSTANT_Fieldref {
       BYTE tag;
       short class_index;
       short name_and_type_index;
} class;

struct CONSTANT_NameAndType_info {
       BYTE tag;
       short name_index;
       short descriptor_index;
} class;

struct CONSTANT_String_info {
       BYTE tag;
       short string_index;
} class;


struct CONSTANT_Utf8 {
       BYTE tag;
       short length;
} class;

struct Code_attribute {
       short attribute_name_index;
       int attribute_length;
       short max_stack;
       short max_locals;
       int code_length;
} class;

This BZ.def can be written like C language. See TYPESTR [NUM_MEMBERS] in the code below for the types you can use. https://github.com/devil-tamachan/binaryeditorbz/blob/master/Bz/BZFormVw.cpp

After starting BZEditor, check "View"> "Structure display" to display the child window for structure display. image.png

By double-clicking an address, the structure information starting from that address is displayed. image.png

Change the order of bytes

Select Motorola when parsing class files. image.png

Parsing class files with BZ Editor

Let's analyze the Class File from the beginning.

From the beginning to constant_pool_count

image.png

"0x CAFE BABE" is displayed for magic. minor version is 0 major_version is 52. constant_pool_count will be 26, and the next byte will be the entry for constant_pool.

Analysis of constant_pool []

constant_pool[1] image.png Since the first byte is 0x0A = 10, the entry for this constant_pool will be CONSTANT_Methodref. class_index is 6 and name_and_type_index is 12. After looking at all the constant_pools, check what these indexes actually indicate.

constant_pool[2] image.png Since the first byte is 0x09, the entry for this constant_pool will be CONSTANT_Fieldref. class_index is 13 and name_and_type_index is 14.

constant_pool[3] image.png

Since the first byte is 0x08, the constant_pool entry will be CONSTANT_String. The index will be 15.

constant_pool[4] image.png

Since the first byte is 0x0A = 10, the entry for this constant_pool will be CONSTANT_Methodref. class_index is 16 and name_and_type_index is 17.

constant_pool[5] image.png

Since the first byte is 0x07, the entry for this constant_pool will be CONSTANT_Class. The index will be 18.

constant_pool[6] image.png

Since the first byte is 0x07, the entry for this constant_pool will be CONSTANT_Class. The index will be 19.

constant_pool[7] image.png

Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 6, and the 6 bytes after that store the characters "\ <init >".

constant_pool[8] image.png

Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 3, and the character "() V" is stored in the subsequent 3 bytes.

constant_pool[9] image.png

Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 4, and the character "Code" is stored in the subsequent 4 bytes.

constant_pool[10] image.png

Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 4, and the 4 bytes after that store the character "main".

constant_pool[11] image.png

Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 22, and the 22 bytes after that store the character "([Ljava / lang / String;) V".

constant_pool[12] image.png

Since the first byte is 0x0C = 12, the entry for this constant_pool will be the CONSTANT_NameAndType_info structure. name_index is 7 and descriptor_index is 8.

constant_pool[13] image.png

Since the first byte is 0x07, the entry for this constant_pool will be CONSTANT_Class. The index will be 20.

constant_pool[14] image.png

Since the first byte is 0x0C = 12, the entry for this constant_pool will be the CONSTANT_NameAndType_info structure. name_index will be 21 and descriptor_index will be 22.

constant_pool[15] image.png

Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 11, and the following 11 bytes store the characters "Hello World".

constant_pool[16] image.png

Since the first byte is 0x07, the entry for this constant_pool will be CONSTANT_Class. The index will be 23.

constant_pool[17] image.png

Since the first byte is 0x0C = 12, the entry for this constant_pool will be the CONSTANT_NameAndType_info structure. name_index is 24 and descriptor_index is 25.

constant_pool[18] image.png

Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 10, and the 10 bytes after that store the characters "Hello World".

constant_pool[19] image.png

Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 16, and the characters "java / lang / Object" are stored in the remaining 16 bytes.

constant_pool[20] image.png

Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 16, and the following 16 bytes store the characters "java / lang / System".

constant_pool[21] image.png

Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 3, and the character "out" is stored in the subsequent 3 bytes.

constant_pool[22] image.png Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 21, and the subsequent 21 bytes store the characters "Ljava / io / PrintStream;".

constant_pool[23] image.png

Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 19, and the subsequent 19 bytes store the characters "java / io / PrintStream".

constant_pool[24] image.png

Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 7, and the 7 bytes after that store the character "println".

constant_pool[25] image.png

Since the first byte is 0x01, the entry for this constant_pool will be [CONSTANT_Utf8](# constant_utf8). The length is 21, and the 21 bytes after that store the character "(Ljava / lang / String;) V".

Summary of constant_pool of HelloWorld.class

The constant_pool can be summarized as follows.

No Structure Contents
1 CONSTANT_Methodref class_indexis6:HelloWorld、name_and_type_indexis12::V()
2 CONSTANT_Fieldref class_index is 13:java/lang/System、name_and_type_index is 14: out:Ljava/io/PrintStream;
3 CONSTANT_String index is 15:「Hello World」
4 CONSTANT_Methodref class_indexis16:java/io/PrintStream、name_and_type_indexis17:println:(Ljava/lang/String;)V
5 CONSTANT_Class index is 18:「HelloWorld」
6 CONSTANT_Class index is 19:「java/lang/Object」
7 CONSTANT_Utf8 「<init>Is the character string
8 CONSTANT_Utf8 「()The string "V"
9 CONSTANT_Utf8 The string "Code"
10 CONSTANT_Utf8 The string "main"
11 CONSTANT_Utf8 「([Ljava/lang/String;)The string "V"
12 CONSTANT_NameAndType_info name_indexis7:「<init>」,descriptor_indexis8:「()V」。MethodDescriptorsSee
13 CONSTANT_Class index is 20;「java/lang/System」
14 CONSTANT_NameAndType_info name_indexis21:「out」,descriptor_indexis22:「Ljava/io/PrintStream;」FieldDescriptorsSee
15 CONSTANT_Utf8 The string "Hello World"
16 CONSTANT_Class index is 23:「java/io/PrintStream」
17 CONSTANT_NameAndType_info name_indexis24:「println」,descriptor_indexis25:「(Ljava/lang/String;)V」MethodDescriptorsSee
18 CONSTANT_Utf8 The string "Hello World"
19 CONSTANT_Utf8 「java/lang/The string "Object"
20 CONSTANT_Utf8 「java/lang/The string "System"
21 CONSTANT_Utf8 The string "out"
22 CONSTANT_Utf8 「Ljava/io/PrintStream;Is the character string
23 CONSTANT_Utf8 「java/io/The string "Print Stream"
24 CONSTANT_Utf8 The string "println"
25 CONSTANT_Utf8 「(Ljava/lang/String;)The string "V"

from access_flags to interfaces []

image.png

-Access_flags is 0x0021. That is, ACC_SUPER (0x20) and ACC_PUBLIC (0x01).

-Since this_class is constant_pool [5], it is a HelloWorld class.

-Since super_class is constant_pool [6], it is a java / lang / Object class.

-Interfaces_count is 0, and the following interfaces [] does not exist.

up to fields count ~ fields []

image.png

Since fields_count is 0, there are no files to follow.

methods_count~methods[] image.png

Since methods_count is 0x0002, there are 2 methods, followed by the method_info structure.

method_info[0] image.png

-Access_flags is 0x0001. That is, ACC_PUBLIC (0x01).

-Name_index will be "\ <init >" of constant_pool [7]. This is an implicit constructor created during Java compilation.

-Description_index will be "() V" of constant_pool [8].

-Attributes_count is 1, and there is one attributes structure.

method_info[0].attributes[0] image.png

Since attribute_name_index is the "Code" of constant_pool [9], this structure will be the [Code_attribute structure](#code_attribute structure).

The attribute_length will be 17 bytes and will determine the size of this structure.

max_stack is 1 and max_locals is 1.

The code_length is 5, which means that the next "0 x 2A B7 00 01 B1" is a bytecode. 0x2a becomes aload_0. This instruction pushes this onto the operand stack.

0xb7 becomes invokesplecial. This instruction calls the method with the following 2 bytes as the index of constant_pool. In this case, it is "0x00 01", so constant_pool [1], "java / lang / Object." ": () V" will be called.

0xb1 becomes return.

Both exception_table_length and attributes_count are 0.

method_info[1] image.png

-Access_flags is 0x0009. That is, ACC_PUBLIC (0x01) and ACC_STATIC (0x08).

-Name_index will be "main" of constant_pool [10].

-The description_index will be "([Ljava / lang / String;) V" of constant_pool [11].

-Attributes_count is 1, and there is one attributes structure.

method_info[1].attributes[0] image.png

Since attribute_name_index is the "Code" of constant_pool [9], this structure will be the [Code_attribute structure](#code_attribute structure).

The attribute_length will be 21 bytes and will determine the size of this structure.

max_stack is 2 and max_locals is 1.

The code_length is 9, which means that the next "0 x B2 00 02 12 03 B6 00 04 B1" is a bytecode.

0xb2 is getstatic. This instruction retrieves a field from a static class with the following 2 bytes as the index of constant_pool. In this case, it is "0x00 02", so get the "out: Ljava / io / PrintStream" of the "java / lang / System" class, which is constant_pool [2]. The obtained result is loaded on the operand stack.

0x12 is ldc. This instruction uses the following 1 byte as the index of constant_pool and pushes its contents onto the operand stack. In this case, it is "0x03", so the character string "Hello World" of constant_pool [3] is loaded on the operand stack.

0xb6 is the invoke virtual. This instruction executes the method using the following 2 bytes as the index of constant_pool. In this case, it is "0x00 04", so execute println: (Ljava / lang / String;) V of java / io / PrintStream class which is constant_pool [4].

0xb1 becomes return.

Both exception_table_length and attributes_count are 0.

ClassFile attributes_count, attributes []

image.png

Since attributes_count is 0, there is no attributes data.

Summary of class file analysis

In this way, you can analyze the class file with the JVM specifications and the binary editor. ** However, you can analyze with javap command without using such a troublesome binary editor. ** **

>javap -v HelloWorld
Classfile /C:/XXXXXXX/HelloWorld.class
  Last modified 2019/06/09; size 340 bytes
  MD5 checksum 3ee6d0a4b44197baaeb0cec79a0b73d3
public class HelloWorld
  minor version: 0
  major version: 52
  flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
   #1 = Methodref          #6.#12         // java/lang/Object."<init>":()V
   #2 = Fieldref           #13.#14        // java/lang/System.out:Ljava/io/PrintStream;
   #3 = String             #15            // Hello World
   #4 = Methodref          #16.#17        // java/io/PrintStream.println:(Ljava/lang/String;)V
   #5 = Class              #18            // HelloWorld
   #6 = Class              #19            // java/lang/Object
   #7 = Utf8               <init>
   #8 = Utf8               ()V
   #9 = Utf8               Code
  #10 = Utf8               main
  #11 = Utf8               ([Ljava/lang/String;)V
  #12 = NameAndType        #7:#8          // "<init>":()V
  #13 = Class              #20            // java/lang/System
  #14 = NameAndType        #21:#22        // out:Ljava/io/PrintStream;
  #15 = Utf8               Hello World
  #16 = Class              #23            // java/io/PrintStream
  #17 = NameAndType        #24:#25        // println:(Ljava/lang/String;)V
  #18 = Utf8               HelloWorld
  #19 = Utf8               java/lang/Object
  #20 = Utf8               java/lang/System
  #21 = Utf8               out
  #22 = Utf8               Ljava/io/PrintStream;
  #23 = Utf8               java/io/PrintStream
  #24 = Utf8               println
  #25 = Utf8               (Ljava/lang/String;)V
{
  public HelloWorld();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=1, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1                  // Method java/lang/Object."<init>":()V
         4: return

  public static void main(java.lang.String[]);
    descriptor: ([Ljava/lang/String;)V
    flags: ACC_PUBLIC, ACC_STATIC
    Code:
      stack=2, locals=1, args_size=1
         0: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
         3: ldc           #3                  // String Hello World
         5: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
         8: return
}

It's easier than reading the binary. Well, this has deepened my understanding of Hello World. I'm happy.

(Investigation) Do you still do it?

Let's dig a little deeper into how this HelloWorld.class works.

Interruptor and compiler

It is often said that Java may be compiled multiple times, but what does that mean? This was described by Tobias Hartmann [The Java HotSpot VM](https://www.ethz.ch/content/dam/ethz/special-interest/infk/inst-cs/lst-dam/documents/Education/ Classes / Spring2018 / 210_Compiler_Design / Slides / 2018-Compiler-Design-Guest-Talk.pdf).

image.png

Bytecode created as shown above can either be executed with machine code compiled with C1 or C2, or with an interpreter. Also, in Java8, what was running on the interpreter may change to the one compiled on C1 from the middle, or it may be compiled step by step as interpreter → C1 → C2.

** I mean, I had no idea how Hello World worked ... **

Whether it was compiled

Is there a way to tell if the executed bytecode was executed in the interpreter or in the compiled machine code? You can find out by using "-XX: + PrintCompilation" when running java.

>java  -XX:+PrintCompilation HelloWorld
     73    1       3       java.lang.String::hashCode (55 bytes)
     74    2       3       java.lang.String::equals (81 bytes)
     75    4     n 0       java.lang.System::arraycopy (native)   (static)
     76    3       4       java.lang.String::charAt (29 bytes)
     76    5       3       java.lang.Object::<init> (1 bytes)
     78    6       4       sun.misc.ASCIICaseInsensitiveComparator::toLower (16 bytes)
     78    7       4       sun.misc.ASCIICaseInsensitiveComparator::isUpper (18 bytes)
     79    8       4       java.lang.String::length (6 bytes)
     79    9       3       java.lang.AbstractStringBuilder::ensureCapacityInternal (27 bytes)
     80   10       3       java.lang.Character::toLowerCase (9 bytes)
     80   11       3       java.lang.CharacterData::of (120 bytes)
     81   15       1       java.lang.Object::<init> (1 bytes)
     81    5       3       java.lang.Object::<init> (1 bytes)   made not entrant
     81   12       3       java.lang.CharacterDataLatin1::toLowerCase (39 bytes)
     82   13       3       java.lang.CharacterDataLatin1::getProperties (11 bytes)
     82   17       3       java.io.WinNTFileSystem::isSlash (18 bytes)
     84   16       3       java.lang.AbstractStringBuilder::append (29 bytes)
     84   18  s    3       java.lang.StringBuffer::append (13 bytes)
     85   14       3       java.lang.Math::min (11 bytes)
     86   19       3       java.lang.StringBuilder::append (8 bytes)
     88   20       3       java.lang.String::getChars (62 bytes)
     90   22       3       java.lang.String::indexOf (70 bytes)
     91   21       3       java.util.Arrays::copyOfRange (63 bytes)
     92   23       3       java.lang.System::getSecurityManager (4 bytes)
Hello World

You can see which method was compiled and executed by using PrintCompilation. Details of this output can be found in StackOverflow, but since it's a big deal, I'll take a look at the source code of the OpenSDK. ..

The Java 8 source code can be obtained from the following. https://download.java.net/openjdk/jdk8u40/ri/openjdk-8u40-src-b25-10_feb_2015.zip

Let's take a look at the code below, which is thought to create the content that is output when Print Compilation is added in this code.

openjdk\hotspot\src\share\vm\compiler\compileBroker.cpp


// ------------------------------------------------------------------
// CompileTask::print_compilation_impl
void CompileTask::print_compilation_impl(outputStream* st, Method* method, int compile_id, int comp_level,
                                         bool is_osr_method, int osr_bci, bool is_blocking,
                                         const char* msg, bool short_form) {
  if (!short_form) {
    st->print("%7d ", (int) st->time_stamp().milliseconds());  // print timestamp
  }
  st->print("%4d ", compile_id);    // print compilation number

  // For unloaded methods the transition to zombie occurs after the
  // method is cleared so it's impossible to report accurate
  // information for that case.
  bool is_synchronized = false;
  bool has_exception_handler = false;
  bool is_native = false;
  if (method != NULL) {
    is_synchronized       = method->is_synchronized();
    has_exception_handler = method->has_exception_handler();
    is_native             = method->is_native();
  }
  // method attributes
  const char compile_type   = is_osr_method                   ? '%' : ' ';
  const char sync_char      = is_synchronized                 ? 's' : ' ';
  const char exception_char = has_exception_handler           ? '!' : ' ';
  const char blocking_char  = is_blocking                     ? 'b' : ' ';
  const char native_char    = is_native                       ? 'n' : ' ';

  // print method attributes
  st->print("%c%c%c%c%c ", compile_type, sync_char, exception_char, blocking_char, native_char);

  if (TieredCompilation) {
    if (comp_level != -1)  st->print("%d ", comp_level);
    else                   st->print("- ");
  }
  st->print("     ");  // more indent

  if (method == NULL) {
    st->print("(method)");
  } else {
    method->print_short_name(st);
    if (is_osr_method) {
      st->print(" @ %d", osr_bci);
    }
    if (method->is_native())
      st->print(" (native)");
    else
      st->print(" (%d bytes)", method->code_size());
  }

  if (msg != NULL) {
    st->print("   %s", msg);
  }
  if (!short_form) {
    st->cr();
  }
}

The type stamp is output in the first column.

The second column is compilation_id and method_attributes. cocmpilation_id is a 4-digit number. method_attributes is a combination of flags and is displayed as follows.

letter conditions
% For OCR method.In case of InvalidOSREntryBci when MethodCompilation of enum type is defined and there are InvocationEntryBci and InvalidOSREntryBci
s In case of synchronized
! exception_If you have a handler
b In case of blocking
n For native code

The third column shows the compile level when Tiered Compilation is ON. This Tiered Compilation can be controlled with the -XX: -Tiered Compilation or + XX: -Tiered Compilation options, but for Java 8 the default is ON. The compile level is as follows.

level Contents
0 interpreter
1 C1 with full optimization (no profiling)
2 C1 with limited profiling
3 C1 with full profiling
4 C2

In other words, the same C1 is divided into three stages.

The method name is output in the 4th column.

Now let's look at the output of the first -XX: + PrintCompilation. It doesn't include the main of the HelloWorld class, so you can see that the code there is running in the interpreter.

Even if it is compiled into machine code, can't you see its contents?

Machine code created by compiling with C1 and C2 is not output to a file, it just exists in memory. There are several steps required to confirm this.

First, get hsdis-amd64.dll that can be disassembled. For Windows, you can download it from: https://sourceforge.net/projects/fcml/files/

After downloading the DLL, pass it through the path.

Then execute the following command.

java -XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+PrintAssembly -XX:+LogCompilation  HelloWorld

A log file is created in the current directory, and you can see what kind of machine code you created.

Output example


Decoding compiled method 0x0000000002d00750:
Code:
RIP: 0x2d008a0 Code size: 0x000001f0
[Entry Point]
[Constants]
  # {method} {0x00000000192f4fc0} &apos;hashCode&apos; &apos;()I&apos; in &apos;java/lang/String&apos;
  #           [sp+0x40]  (sp of caller)
  0x0000000002d008a0: mov     r10d,dword ptr [rdx+8h]
  0x0000000002d008a4: shl     r10,3h
  0x0000000002d008a8: cmp     r10,rax
  0x0000000002d008ab: jne     2c35f60h          ;   {runtime_call}
  0x0000000002d008b1: nop     word ptr [rax+rax+0h]
  0x0000000002d008bc: nop
[Verified Entry Point]
  0x0000000002d008c0: mov     dword ptr [rsp+0ffffffffffffa000h],eax
  0x0000000002d008c7: push    rbp
  0x0000000002d008c8: sub     rsp,30h
  0x0000000002d008cc: mov     rax,193e7ac8h
  0x0000000002d008d6: mov     esi,dword ptr [rax+8h]
  0x0000000002d008d9: add     esi,8h
  0x0000000002d008dc: mov     dword ptr [rax+8h],esi
//Abbreviation

A lot of log files are difficult to see ...

The log file that outputs the machine code has a large amount of information output, and it will be difficult to find the desired information. In this case, you may want to browse on JitWatch. https://github.com/AdoptOpenJDK/jitwatch/

image.png

image.png

Please refer to the following for detailed usage.

** Watch JIT compilation on JIT Watch! ** ** https://www.sakatakoichi.com/entry/2014/12/04/202747

How does the interpreter interpret bytecode?

By now, we know that HelloWorld :: main is running on the interpreter. Then, where and how are the following commands such as getstatic ~ return specifically processed?

  public static void main(java.lang.String[]);
    descriptor: ([Ljava/lang/String;)V
    flags: ACC_PUBLIC, ACC_STATIC
    Code:
      stack=2, locals=1, args_size=1
         0: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
         3: ldc           #3                  // String Hello World
         5: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
         8: return
}

The research team, who was hunting for the OpenSDK source code to unravel this mystery, finally found the relevant part. I found that instructions such as getstatic and ldc are interpreted and executed by runWithChecks () / run () in bytecodeInterpreter.cpp.

hotspot\src\share\vm\interpreter\bytecodeInterpreter.cpp


/*
 * BytecodeInterpreter::run(interpreterState istate)
 * BytecodeInterpreter::runWithChecks(interpreterState istate)
 *
 * The real deal. This is where byte codes actually get interpreted.
 * Basically it's a big while loop that iterates until we return from
 * the method passed in.
 *
 * The runWithChecks is used if JVMTI is enabled.
 *
 */
#if defined(VM_JVMTI)
void
BytecodeInterpreter::runWithChecks(interpreterState istate) {
#else
void
BytecodeInterpreter::run(interpreterState istate) {
#endif
  //Abbreviation
#ifndef USELABELS
  while (1)
#endif
  {
#ifndef PREFETCH_OPCCODE
      opcode = *pc;
#endif
      // Seems like this happens twice per opcode. At worst this is only
      // need at entry to the loop.
      // DEBUGGER_SINGLE_STEP_NOTIFY();
      /* Using this labels avoids double breakpoints when quickening and
       * when returing from transition frames.
       */
  opcode_switch:
      assert(istate == orig, "Corrupted istate");
      /* QQQ Hmm this has knowledge of direction, ought to be a stack method */
      assert(topOfStack >= istate->stack_limit(), "Stack overrun");
      assert(topOfStack < istate->stack_base(), "Stack underrun");

#ifdef USELABELS
      DISPATCH(opcode);
#else
      switch (opcode)
#endif
      {
      CASE(_nop):
          UPDATE_PC_AND_CONTINUE(1);
      //Abbreviation
      }
    }
  }
}

This loops with while until all the bytecodes in the method are executed, and is branched and executed in the CASE section according to the instruction code. For example, getstatic has the following implementation.

getstatic


      CASE(_getfield):
      CASE(_getstatic):
        {
          u2 index;
          ConstantPoolCacheEntry* cache;
          //Note: Current bytecode location pc+Get 1 to 2 bytes of data and store it in index
          index = Bytes::get_native_u2(pc+1);

          // QQQ Need to make this as inlined as possible. Probably need to
          // split all the bytecode cases out so c++ compiler has a chance
          // for constant prop to fold everything possible away.
          //Note: constatnt_Specify index from table and take a value.
          cache = cp->entry_at(index);
          if (!cache->is_resolved((Bytecodes::Code)opcode)) {
            CALL_VM(InterpreterRuntime::resolve_get_put(THREAD, (Bytecodes::Code)opcode),
                    handle_exception);
            cache = cp->entry_at(index);
          }

#ifdef VM_JVMTI
          if (_jvmti_interp_events) {
            int *count_addr;
            oop obj;
            // Check to see if a field modification watch has been set
            // before we take the time to call into the VM.
            count_addr = (int *)JvmtiExport::get_field_access_count_addr();
            if ( *count_addr > 0 ) {
              if ((Bytecodes::Code)opcode == Bytecodes::_getstatic) {
                obj = (oop)NULL;
              } else {
                obj = (oop) STACK_OBJECT(-1);
                VERIFY_OOP(obj);
              }
              CALL_VM(InterpreterRuntime::post_field_access(THREAD,
                                          obj,
                                          cache),
                                          handle_exception);
            }
          }
#endif /* VM_JVMTI */

          oop obj;
          if ((Bytecodes::Code)opcode == Bytecodes::_getstatic) {
            //Note: constant_table[2]Get the class information of and put it in obj
            Klass* k = cache->f1_as_klass();
            obj = k->java_mirror();
            MORE_STACK(1);  // Assume single slot push
          } else {
            obj = (oop) STACK_OBJECT(-1);
            CHECK_NULL(obj);
          }

          //
          // Now store the result on the stack
          //
          TosState tos_type = cache->flag_state();
          //Note: constant_table[2]Get field information of field_Put in offset
          int field_offset = cache->f2_as_index();
          if (cache->is_volatile()) {
            if (support_IRIW_for_not_multiple_copy_atomic_cpu) {
              OrderAccess::fence();
            }
            if (tos_type == atos) {
              VERIFY_OOP(obj->obj_field_acquire(field_offset));
              SET_STACK_OBJECT(obj->obj_field_acquire(field_offset), -1);
            } else if (tos_type == itos) {
              SET_STACK_INT(obj->int_field_acquire(field_offset), -1);
            } else if (tos_type == ltos) {
              SET_STACK_LONG(obj->long_field_acquire(field_offset), 0);
              MORE_STACK(1);
            } else if (tos_type == btos) {
              SET_STACK_INT(obj->byte_field_acquire(field_offset), -1);
            } else if (tos_type == ctos) {
              SET_STACK_INT(obj->char_field_acquire(field_offset), -1);
            } else if (tos_type == stos) {
              SET_STACK_INT(obj->short_field_acquire(field_offset), -1);
            } else if (tos_type == ftos) {
              SET_STACK_FLOAT(obj->float_field_acquire(field_offset), -1);
            } else {
              SET_STACK_DOUBLE(obj->double_field_acquire(field_offset), 0);
              MORE_STACK(1);
            }
          } else {
            if (tos_type == atos) {
              //Note: constant_table[2]Gets the field of the class of and stores the result as an object on the stack.
              VERIFY_OOP(obj->obj_field(field_offset));
              SET_STACK_OBJECT(obj->obj_field(field_offset), -1);
            } else if (tos_type == itos) {
              SET_STACK_INT(obj->int_field(field_offset), -1);
            } else if (tos_type == ltos) {
              SET_STACK_LONG(obj->long_field(field_offset), 0);
              MORE_STACK(1);
            } else if (tos_type == btos) {
              SET_STACK_INT(obj->byte_field(field_offset), -1);
            } else if (tos_type == ctos) {
              SET_STACK_INT(obj->char_field(field_offset), -1);
            } else if (tos_type == stos) {
              SET_STACK_INT(obj->short_field(field_offset), -1);
            } else if (tos_type == ftos) {
              SET_STACK_FLOAT(obj->float_field(field_offset), -1);
            } else {
              SET_STACK_DOUBLE(obj->double_field(field_offset), 0);
              MORE_STACK(1);
            }
          }
          //Execute the instruction 3 bytes ahead of getstatic.
          UPDATE_PC_AND_CONTINUE(3);
         }
// Have to do this dispatch this way in C++ because otherwise gcc complains about crossing an
// initialization (which is is the initialization of the table pointer...)
#define DISPATCH(opcode) goto *(void*)dispatch_table[opcode]
//Abbreviation
#define UPDATE_PC_AND_CONTINUE(opsize) {                        \
        pc += opsize; opcode = *pc;                             \
        DO_UPDATE_INSTRUCTION_COUNT(opcode);                    \
        DEBUGGER_SINGLE_STEP_NOTIFY();                          \
        DISPATCH(opcode);                                       \
    }

DISPACH is executed when the next instruction is executed. This is a goto statement, which executes the next instruction by jumping to the instruction label.

As you can see, with regard to interrupters, if you look at this code as a starting point, you can get an idea of what kind of processing you are doing. ~~ At this point, there are over 30,000 characters, so it's hard to put all the analyzed information together ~~

Summary

This time, I went back to the beginning and saw how Hello World works.

I say, "Seeing is believing, seeing is believing," but I'm sorry, I licked Hello World. It was very annoying.

reference:

Demystifying the JVM: Interpretation, JIT and AOT Compilation https://metebalci.com/blog/demystifying-the-jvm-interpretation-jit-and-aot-compilation/#disqus_thread

DEMYSTIFYING THE JVM: JVM VARIANTS, CPPINTERPRETER AND TEMPLATEINTERPRETER https://metebalci.com/blog/demystifying-the-jvm-jvm-variants-cppinterpreter-and-templateinterpreter/#disqus_thread

** Survey of Java JIT compilation with JIT Watch ** https://www.oracle.com/webfolder/technetwork/jp/javamagazine/Java-MA15-Architect-newland.pdf

** Watch JIT compilation on JIT Watch! ** ** https://www.sakatakoichi.com/entry/2014/12/04/202747

** I tried to output [Java] using only a binary editor ** https://tech.recruit-mp.co.jp/etc/java_class_hello_world/

Recommended Posts

Returning to the beginning, Java-Kisama mocked Hello World-
Returning to the beginning, getting started with Java ② Control statements, loop statements
Challenge to explain Java's Hello World while referring to the language specifications
Hello, World! In the bootstrap loader area
Introduction to Ratpack (3) --hello world detailed explanation
Display "Hello World" in the browser using Java
Display "Hello World" in the browser using Java
Try to display hello world with spring + gradle
Easy to display hello world with Rails + Docker
Read "Hello world"
Java Hello World
[Introduction] Display Android Studio Hello World on the emulator
Kotlin may take the world from App to Web