Nowadays ... I didn't know Java was "compiled" twice! (> <)

2019/9/29 Corrected the text title and citation source title, and added "5. Digression" 2019/8/18 Partial revision of footnotes due to addition of citations and reference materials 2018/10/14 Partial revision

==========================================================================

** 0. Introduction **

Thanks. long time no see. It is zd6ir7. It's been a while since my last post "I built a Java EE environment on AWS and tried running a web application", but this summer (2018) (Summer of the year) I felt a little embarrassed and would like to post on this theme. As in the example, please note that some details have been omitted for the sake of clarity.

** 1. How the program works **

Before talking about the main subject, we first need to touch on the mechanism (flow) when a program runs. Every program in the world, whether Visual Basic or C ++, is loaded from disk into memory and eventually interpreted and executed by the CPU. プログラムが実行されるまでの流れ.png

In order to be executed by the CPU, the program must be able to translate into a language interpreted by the CPU. The "language" is called "machine language" (* 1), and it is a language consisting of only 0s and 1s. Why is it composed only of 0s and 1s? This is because one pin of the electronic component (IC) that forms the CPU can only take two states, one that conducts electricity and the other that does not. Due to the characteristics of this IC, the CPU handles all information in binary numbers consisting of 0s and 1s (* 2).

** 2. The essence of "compilation" **

In this way, converting the source code of a program into a file in machine language consisting of only 0s or 1s so that the CPU can interpret it is called "compilation", and the tool for conversion is called "compiler". I'm out. The machine language file compiled by the compiler is loaded into memory and finally executed by the CPU (* 3).

"Citation / Reference Material" 3. ~ 5. According to, from the viewpoint of compilation, programming languages in the world can be divided into the following two types.

  1. Compiled language A language that allows the CPU to execute a program after converting the source code into a machine language file. Typical languages include Fortran, C, and COBOL.

  2. Interpretered language A language that allows a CPU to execute a program as it is without converting the source code into a machine language file. Typical languages include Perl, Ruby, PHP, etc.

  3. Needs a little supplement. Since the CPU only knows the machine language, you have to convert it to the machine language anyway, for example, to execute Python code. This is where the "interpreter" comes in, which is Python in this example, but the Python code is line by line.

--Read --Convert to machine language --Let the CPU execute By repeating this until the end of the code, the program is executed. コンパイラ型言語.png

インタプリタ型言語.png

What is clear here is, of course, the fact that programs must always be translated into machine language. The only difference is whether the source code is converted to another file by a means called "compilation" and executed by the CPU, or it is not converted and is executed directly by the CPU using an interpreter (* 4).

** 3. Java is "compiled" twice! ?? ** **

Now, here is the introduction of Java. What process does Java go through from source code to execution? java実行の大まかな流れ.png

The figure above shows a simple flow, but the bottom line is that Java is "compiled" and executed twice. It goes through a special process with other programming languages, but here are the details:

  1. Conversion from source code to class file (1st "compile") The Java compiler (Javac tool) "compiles" source code into a class file consisting of bytecode, an 8-bit byte stream consisting only of 0s and 1s (* 5). 第1回目コンパイル.png

  2. From class file to execution (2nd "compile") When the Java Virtual Machine (JVM) reads this class file and enters the execution stage (* 6), --First, the interpreter in the JVM interprets the bytecode line by line and executes it in the JVM. --When information about the code to be executed is accumulated by execution by the interpreter, the JIT (Just-In-Time) compiler in the JVM is also optimized so that frequently called methods and loop processing can be completed in a short time. After applying, "compile" the byte code to be executed into a machine language in a unit (* 7) and execute it. Machine language is an assembly language system, specifically in the form of "[index] [instruction] [operand]" (* 8).

An interpreter and a (JIT) compiler coexist in the JVM, and a Java program is executed by a combination of two functions: reading line by line and converting to machine language, or "compiling" in a unit. You can see that it has been done. There are various options in the JVM. As mentioned above, you can have the interpreter execute it first and then call the JIT compiler, or you can have the JIT compiler compile it suddenly, or you can combine the two appropriately. Options are also available (* 9). It is an image to choose one according to the performance requirements. For details of the options, see 3. of "Citations / References". Please refer to.

第2回目コンパイル_再修正.png

Four. in conclusion

What did you think? I think you could imagine Java being compiled and executed twice. To summarize the story so far,

--Any programming language, whether it's an interpreter or a compiled language, must eventually be converted to a machine language consisting of only 0s and 1s so that it can be interpreted and executed by the CPU. It doesn't become. --Java, unlike other programming languages, takes the form of being "compiled" and executed twice. In the first "compile", the source code is converted to a class file, and in the second "compile", the class file is finally converted to machine language. At the second time, optimization is added based on the information obtained by sequential execution by the "interpreter".

It will be. When you say "compile" in Java, you may be confused as to which "compile" it is. In that case, I hope that this article will be useful for the understanding of readers. On the other hand important thing is, is common in any programming language, including Java, I think not that you understand the mechanism of when the program is run. That's because when asked about Java "compilation", you can minimize confusion and, by extension, figure out what it really is.

**Five. Digression **

Thank you for reading this far. I decided to write this article because I read the explanation of "JIT compiler" in the following book. By the way, this book is also quoted in my previous post "Is 100% CPU usage bad?". Please refer to it later.

Translated by Scott Oaks, translated by Acroquest Technology Co., Ltd., translated by Kao Terada, translated by Satoshi Makino "Java Performance" published by O'Reilly Japan https://www.oreilly.co.jp/books/9784873117188/

At the time of 2018, I didn't fully understand what the "JIT compiler" was, and I assumed that the Java "compiler" was to convert java source code into a class file, so the "JIT compiler" in this book I couldn't understand it at all even after reading the explanation. However, after a lot of research, I noticed that it was "compiled" twice, and I came up with a post to share this awareness.

Even so, nothing was more difficult to make a manuscript than this time. Since there are few "previous researches" so far, it may be unsightly because of repeated hits to the wall and overcoming it ... but we will improve it from time to time. .. See you next time. Even so, the footnote has become bigger ... f (^^;)

footnote

(* 1) Also called "machine language" or "native code". In this article, we will use "machine language". (* 2) 1. of "Citation / Reference Material". There is a detailed explanation of how the CPU works and why it should be handled in binary, so please refer to that as appropriate. In addition, 2. of "Citation / Reference Material". Please also refer to the easy-to-understand description of how the computer works. (* 3) Although it is detailed, when it comes to the definition of the term "compile", it is slightly different depending on the literature or website, such as whether it is simply converted to machine language or whether it is converted to some form. It seems. In this paper, compiling is defined as the role of converting to another file format, which the compiler originally played in various programming languages, and the conversion to machine language "files" from the viewpoint of comparison of interpreted languages. (* 4) 3. of "Citation / Reference Material". And 5. As shown in, in a compiler-type language, efficient code is executed by analyzing and optimizing the source code by the compiler, while in an interpreter-type language, it is executed line by line, so a compiler-type language It has the disadvantage of being slower. As an aside, I have summarized the characteristics of both languages below.

Compiled language Interpretered language
Performance(Execution speed) Relatively fast Relatively slow
Programming load Relatively large Relatively small(Lightweight language)
Portability Low high
type Fortran、COBOL、C、C++etc Perl, Ruby, PHP, Python, etc.

Initially I used C # as a compiled language, but I have removed it. This is because C # is converted to Common Intermediate Language (CIL) and executed by the runtime as described in ".NET Concept and Structure", which is pinpoint. This is because it cannot be positioned with. Although Python is an interpreted language, it is "compiled" into a pyc file as described in "How to create a Python pyc file [for beginners]". Since it is executed from, there are also elements of each compiler type language pinpoint.

(* 5) The detailed structure of the class file is described in 6. of "Citation / Reference Material". , 7. Please refer to. (* 6) The class file is not executed suddenly in the JVM, but is read by the "class loader subsystem", information is assigned to the "runtime data area", and then in the "runtime engine". Executed by the interpreter, JIT compiler. For details, see 8. "Citations / Reference Materials". , 9. , 11. Please refer to. (* 7) Unfortunately, regarding the second "compilation", what kind of format should be specifically dropped, such as whether to convert it to an intermediate file once, other than the "citation / reference material" mentioned in this article. , Various sites, documents, etc. were checked, but could not be found. It seems that it can not be published because it is related to the detailed specifications of the JVM, but my guess is that one or more classes, several bytes of bytecode, or a specific unit, intermediate and temporary machine language " It is thought that it is converted to "file" and loaded into memory. (* 8) If you execute the javap command for a specific class file, you can confirm that it is an assembly language system. For details on the javap command, see 6. ,Ten. Please refer to. By the way, the explanation about assembly language is described in 1. of "Citation / Reference Material". , 2. Please refer to. (* 9) The JIT compiler can be disabled by setting the option "-Djava.compiler = NONE". See Disabling JIT (https://www.ibm.com/support/knowledgecenter/en/SSYKE2_7.0.0/com.ibm.java.zos.70.doc/user/disjit.html).

** Citation / Reference Material **

  1. [Written by Hisao Yazawa / Supervised by Nikkei Software "Why the Program Works-Basic Knowledge of Programs You Should Know-" Nikkei BP](https://www.amazon.co.jp/%E3%83%97 % E3% 83% AD% E3% 82% B0% E3% 83% A9% E3% 83% A0% E3% 81% AF% E3% 81% AA% E3% 81% 9C% E5% 8B% 95% E3 % 81% 8F% E3% 81% AE% E3% 81% 8B-% E7% AC% AC% EF% BC% 92% E7% 89% 88-% E7% 9F% A5% E3% 81% A3% E3 % 81% A6% E3% 81% 8A% E3% 81% 8D% E3% 81% 9F% E3% 81% 84% E3% 83% 97% E3% 83% AD% E3% 82% B0% E3% 83 % A9% E3% 83% A0% E3% 81% AE% E5% 9F% BA% E7% A4% 8E% E7% 9F% A5% E8% AD% 98-% E7% 9F% A2% E6% B2% A2% E4% B9% 85% E9% 9B% 84 / dp / 4822283151)

  2. [Written by Hisao Yazawa / Supervised by Nikkei Software "Why Computers Work-Basic Knowledge of Hardware & Software You Should Know-" Nikkei BP](https://www.amazon.co.jp/%E3 % 82% B3% E3% 83% B3% E3% 83% 94% E3% 83% A5% E3% 83% BC% E3% 82% BF% E3% 81% AF% E3% 81% AA% E3% 81 % 9C% E5% 8B% 95% E3% 81% 8F% E3% 81% AE% E3% 81% 8B% EF% BD% 9E% E7% 9F% A5% E3% 81% A3% E3% 81% A6 % E3% 81% 8A% E3% 81% 8D% E3% 81% 9F% E3% 81% 84% E3% 83% 8F% E3% 83% BC% E3% 83% 89% E3% 82% A6% E3 % 82% A8% E3% 82% A2% EF% BC% 86% E3% 82% BD% E3% 83% 95% E3% 83% 88% E3% 82% A6% E3% 82% A8% E3% 82 % A2% E3% 81% AE% E5% 9F% BA% E7% A4% 8E% E7% 9F% A5% E8% AD% 98% EF% BD% 9E-% E7% 9F% A2% E6% B2% A2-% E4% B9% 85% E9% 9B% 84 / dp / 4822281655)

  3. Written by Scott Oaks, translated by Acroquest Technology Co., Ltd., translated by Kao Terada, translated by Satoshi Makino, "Java Performance", O'Reilly Japan

  4. By TAKUYA-110 "Programming Basics" Qiita Blog

  5. [Shingo Tsuji's "Python Start Book" Technical Review Company](https://www.amazon.co.jp/Python%E3%82%B9%E3%82%BF%E3%83%BC%E3% 83% 88% E3% 83% 96% E3% 83% 83% E3% 82% AF-% E8% BE% BB-% E7% 9C% 9F% E5% 90% BE / dp / 4774142298)

  6. Y torazuka, "How to read Java class files for beginners" SlideShare

  7. Java Language and Virtual Machine Specifications Oracle Docs

  8. Java Program JVM

  9. The JVM Architecture Explained - An overview of the different components of the JVM, along with a very useful diagram by Jackson Joseraj Java Zone

  10. javap --Java Class File Disassembler Oracle Docs

  11. Preface to java virtual machine and architecture by admin

  12. and 2. In "Personally Useful Technical Books", 3. Is also mentioned in "Is 100% CPU usage bad?", so please check it out.

Recommended Posts

Nowadays ... I didn't know Java was "compiled" twice! (> <)
I didn't know the basics of Python
Summary of numpy functions I didn't know
I didn't know much about Linux environment variables