I've only used Python as an extension of shell scripts, but recently I've been getting more and more opportunities to use it for more decent purposes. Then, as a low-layer enthusiast, I was wondering how Python was executed. So I picked it up and ate it.
The story that the language specification and implementation are different things.
There is only one language, Python, but there is more than one way to achieve the functionality of that language. CPython and PyPy are the names for implementations of the language Python. Speaking of C language, it seems that GCC or Clang can be selected as the compiler. For the types of Python implementations, see Python Wikipedia. There are quite a lot.
So, among various implementations, CPython is the so-called reference implementation, which is implemented by the original author of Python, and the Python execution environment in the world is roughly this, which is exactly the original position.
As you can imagine from the name, CPython is implemented in C language, and PyPy is implemented in Python. And PyPy is faster than CPython. Hmm? What does it mean that PyPy is faster than CPython when C is faster than Python? I would like to explain a little about that.
If you read Wikipedia on CPython,
CPython is a bytecode interpreter.
a. Bytecode interpreter. What is it?
Bytecode is an intermediate representation. In other words, under the CPython environment, Python is first converted to bytecode, and the bytecode is executed by the virtual machine (VM). It is called a bytecode interpreter because it interprets and executes bytecode sequentially.
The reason for doing that seems to be that it's faster, but I'm not sure why the total execution time is decisively faster with bytecode in the language running on the interpreter. There wasn't. However, at least the implementation of the interpreter will be refreshing, and if you leave the bytecode as a cache, you will not have to do most of the processing such as parsing from the second time onward, so it definitely seems to make sense. When you execute Python code, .pyc files and \ _ \ _ pycache \ _ \ _ directories are created, but it seems that bytecodes are recorded in these. It seems possible to take only these bytecodes to another environment and execute them.
that? By the way, there is a famous language with such specifications. Yes, Java. At the beginning of the Java description, there is a description that Java code is converted to bytecode and Jave VM executes it. In both Python and Java, the source code is converted (compiled) into bytecode and then the VM (interpreter) is executing. Java is recognized as a compilation language and Python as an interpreter language, but the reality is that compilation is done explicitly or implicitly.
Why is PyPy written in Python faster than CPython? This is because it is JIT (Just In Time) compilation.
What is JIT compilation? Roughly speaking, when you execute it, it is compiled into machine language and then executed, so it will be faster. For example, consider looping and a function that is called many times. If it is a simple interpreter, the grammar will be interpreted each time those codes are called, and the interpreter will execute the processing based on the contents. Since the actual state of the interpreter is naturally a collection of machine languages, after all, it is like executing the machine language after performing the code ⇒ machine language conversion every time. Then, if the code that is called repeatedly is converted into machine language at once, and the machine language is executed directly when the same code is called, the conversion processing time can be reduced. In addition, an interpreter that converts code line by line cannot be optimized based on the processing flow, but if you read and convert the code in a batch to some extent, you may be able to perform some optimization. ..
However, there are various methods for speeding up by JIT compilation, and I honestly don't understand what is the key to speeding up. Moreover, PyPy seems to take a special method of JIT compiling the processing code, and it seems difficult to understand the contents.
By the way, if you look at the PyPy download page, it is stated that the JIT compiler works only on Intel CPUs.
These binaries include a Just-in-Time compiler. They only work on x86 CPUs that have the SSE2 instruction set (most of them do, nowadays), or on x86-64 CPUs. They also contain stackless extensions, like greenlets.
I'm guessing because I haven't checked the source, but JIT compilation means that there is a process in the language processing system that generates an assembly that depends on the CPU architecture. Implementing processing that corresponds to many CPU architectures in the world is a difficult task just to think about. It may be implemented only for Intel CPUs with a large number of users.
If you use PyPy, when you execute Python code, PyPy will read and execute it. So who is running PyPy written in Python? Apparently PyPy's Python code has been converted to C and compiled into binary is running.
I found that PyPy has a built-in JIT compiler and is fast, but by the way, Python had a library called Numba that compiles JIT. Looking at the Numba Guide, it seems that it supports a reasonable number of CPU architectures.
Architecture: x86, x86_64, ppc64le. Experimental on armv7l, armv8l (aarch64).
Is Numba working hard to implement architecture-specific support?
After a little research, Numba seems to be using LLVM. If you are using LLVM, if you convert the Python code to LLVM IR (an intermediate representation of LLVM), LLVM will handle each CPU architecture, so there is no need to support it on the Numba side.
I tried to find out how Python code is executed at will. I felt that there was almost no boundary between the interpreter language and the compiler language. It compiles in the interpreter language for speed, and some compile languages work like an interpreter for convenience. I thought I knew about JIT compilation, but I didn't know the details at all.
How is the Python implementation implemented and how does it work? Is Python interpreted one by one or compiled?
Recommended Posts