[Java] Kinx Library-JIT Compiler Library

23 minute read

Kinx library-JIT compiler library

Introduction

The script language Kinxdeliveredby”“ItlookslikeJavaScript,thebrain(contents)isRuby,(stabilityisAC/DC)”). I made a library for JIT compilation.

JIT I want to do it. This time, I made SLJIT, which is also used in Kinx-native, easy to use and made it into a library. Since SLJIT itself has few documents and uses it by decoding it from the source, I thought that I should write down how to use SLJIT itself as a memorandum, but this time it is pending. I might do it somewhere.

However, it is easier to use SLJIT than it is, so I think this is better. Host language is also script, so you can enjoy it easily.

What is it like?

First, I will give you a sample of what the program will look like. It seems that various detailed stories will continue and you will not reach here. .. ..

using Jit;

var c = new Jit.Compiler();
var entry1 = c.enter();
    var jump0 = c.ge(Jit.S0, Jit.IMM(3));
    c.ret(Jit.S0);
    var l1 = c.label();
    c.sub(Jit.R0, Jit.S0, Jit.IMM(2));
    c.call(entry1);
    c.mov(Jit.S1, Jit.R0);
    c.sub(Jit.R0, Jit.S0, Jit.IMM(1));
    c.call(entry1);
    c.add(Jit.R0, Jit.R0, Jit.S1);
    c.ret(Jit.R0);

jump0.setLabel(l1);
var code = c.generate();

for (var i = 1; i <= 42; ++i) {
    var tmr = new SystemTimer();
    var r = code.run(i);
    System.println("[%8.3f] fib(%2d) = %d" %tmr.elapsed() %i %r);
}

Create a Jit.Compiler object, create a function entry with enter, play around with various registers and write ret. So, when you execute it, you can generate() and then become run(). You can also see the assemble listing by generate() and dump().

If you want to skip various things, go to Sample! →The sample also benchmarks with Ruby, Python, and PyPy.

SLJIT

What is SLJIT in the first place?

In a nutshell, it is a library that solves the problem of assembler that abstract assembler can support multiple environments with one writing method and it has to be recreated because it is different for each CPU. The following platforms are currently supported.

  • SLJIT support platform
    • Intel-x86 32
    • AMD-x86 64
    • ARM 32 (ARM-v5, ARM-v7 and Thumb2 instruction sets)
    • ARM 64
    • PowerPC 32
    • PowerPC 64
    • MIPS 32 (III, R1)
    • MIPS 64 (III, R1)
    • SPARC 32

However, please be aware that the Kinx version of JIT library introduced here only supports (64bits) x64 Windows and x64 Linux, so please be aware of that.

Official? Explanatory document

As far as I know, I could find only about the following documents.

  • https://zherczeg.github.io/sljit/
  • http://ftp.jaist.ac.jp/pub/NetBSD/NetBSD-current/src/sys/external/bsd/sljit/dist/doc/tutorial/sljit_tutorial.html

It will be helpful.

The GitHub repository is below.

  • https://github.com/zherczeg/sljit

Jit

Now, JIT library as Kinx library. It’s more convenient than using C as it is. Of course, I think you can get more control with the C library, but you can do something with it.

using Jit

The Jit library is not built-in, so you can load it explicitly using the using directive.

using Jit;

Jit object

Jit object defines methods for parameters and compiler class.

Jit parameter method

There are three types of Jit parameters: immediate, register, and memory access. It is used in the following form.

Immediate value, memory access

Immediate value and memory access are used in the following methods. Jit.VAR() is a special method for using the local variable area. A local variable area is automatically allocated in the stack area and that area is used.

Method Remarks
Jit.IMM(v) Write the same for both 64-bit integers and floating-point numbers. Match with the register to which you are substituting.
Jit.VAR(n) Local variable area. 1 variable fixed at 8 bytes.
Jit.MEM0(address) Substitutes an immediate value for address, but since the actual address cannot be specified from the script at present, it cannot be used from the script.
Jit.MEM1(r1, offset) Considers the register specified in r1 as an address, and indicates the memory address at the offset position (in bytes).
Jit.MEM2(r1, r2, shift) shift is 0 for 1 byte, 1 for 2 bytes, 2 for 4 bytes, 3 for 8 bytes, and r1 + r2 * (shift bytes) Indicates the memory address at the location of .
register

The following registers can be used. The number of registers that can be used in a function is automatically calculated, and it changes for each function (range delimited by enter()).

Register Purpose
Jit.R0 ~ Jit.R5 General-purpose registers. Temporary use. May be destroyed after calling another function.
Jit.S0 ~ Jit.S5 General-purpose registers. Guaranteed not to be destroyed after calling another function.
Jit.FR0 ~ Jit.FR5 Floating point register. Temporary use. May be destroyed after calling another function.
Jit.FS0 ~ Jit.FS5 Floating point register. Guaranteed not to be destroyed after calling another function.

Since the maximum number of registers for Floating Point is FR/FS in total, if you use FR4, you can use only FS0. If you use up to FR5, you cannot use all FS*. Please note that it looks like the following.

FR* Register FS* Register      
(Not available) FS0, FS1, FS2, FS3, FS4, FS5      
FR0 FS0, FS1, FS2, FS3, FS4   FR0, FR1 FS0, FS1, FS2, FS3
FR0, FR1, FR2 FS0, FS1, FS2      
FR0, FR1, FR2, FR3 FS0, FS1      
FR0, FR1, FR2, FR3, FR4 FS0      
FR0, FR1, FR2, FR3, FR4, FR5 (Not available)      

Jit compiler

To create Jit instructions, create a Jit compiler object.

var c = new Jit.Compiler();

The Jit compiler has the following methods.

Jit compiler method Return value Overview        
Jit.Compiler#label() label Adds a label at the current location.        
Jit.Compiler#makeConst(reg, init) ConstTarget Outputs a provisional definition code for setting an immediate value after code generation.        
             
Jit.Compiler#localp(dst, offset)   Outputs the code to acquire the real address of the local variable. It is stored in the register indicated by dst. offset is the local variable number.        
             
Jit.Compiler#enter(argType) label Create an entrance to a function. You can specify the argument type (optional).        
Jit.Compiler#fastEnter(reg) label Create an entrance to a function. However, the extra epilog and prolog are not output, and the return address is saved in reg.        
             
Jit.Compiler#ret(val)   Outputs the Return code. returns val. val is returned in the FR0 register for floating point numbers, and in the R0 register otherwise.        
             
Jit.Compiler#f2i(dst, op1)   Outputs the code that casts double to int64_t. dst is a general-purpose register. op1 is a floating point register.        
Jit.Compiler#i2f(dst, op1)   Outputs the code that casts int64_t to double. dst is a floating point register. op1 is a general-purpose register.        
             
Jit.Compiler#mov(dst, op1)   Outputs the code that assigns op1 to dst. Floating point and other types are automatically recognized.        
             
Jit.Compiler#neg(dst, op1)   Outputs the code that stores the sign-inverted result of op1 in dst.        
Jit.Compiler#clz(dst, op1)   Counts the number of bits that are 0 from the beginning of op1 and outputs the code stored in dst.        
Jit.Compiler#add(dst, op1, op2)   Outputs the code that stores the result of adding op1 and op2 in dst.        
Jit.Compiler#sub(dst, op1, op2)   Outputs the code that stores the result of subtracting op1 and op2 in dst.        
Jit.Compiler#mul(dst, op1, op2)   Outputs the code that stores the result of multiplying op1 and op2 in dst.        
Jit.Compiler#div(dst, op1, op2)   Outputs the code to store the result of dividing op1 and op2 in dst only for floating point numbers.        
Jit.Compiler#div()   Outputs the code that stores in the R0 register the value divided by the general-purpose register as unsigned.   Jit.Compiler#sdiv()   Outputs the code that stores the value divided in the general-purpose register as signed in the R0 register.
Jit.Compiler#divmod()   Outputs the code that stores the value divided by the general-purpose register as unsigned in the R0 register and the remainder in the R1 register.        
Jit.Compiler#sdivmod()   Outputs the code that stores the value divided by the general-purpose register as signed in the R0 register and the remainder in the R1 register.        
             
Jit.Compiler#not(dst, op1)   Outputs the code that stores the bit-reversed result of op1 in dst.        
Jit.Compiler#and(dst, op1, op2)   Outputs the code that stores the value bit-ANDed with op1 and op2 in dst.        
Jit.Compiler#or(dst, op1, op2)   Outputs the code that stores the value bit-ORed with op1 and op2 in dst.        
Jit.Compiler#xor(dst, op1, op2)   Outputs the code that stores the value XORed with op1 and op2 in dst.        
Jit.Compiler#shl(dst, op1, op2)   Outputs the code that stores the value of op1 left-shifted by op2 bits in dst.        
Jit.Compiler#lshr(dst, op1, op2)   Outputs the code that stores the value logically right-shifted by op2 bits of op1 in dst.        
Jit.Compiler#ashr(dst, op1, op2)   Outputs the code that stores the value obtained by shifting op1 by op2 bits arithmetically right in dst.        
             
Jit.Compiler#call(label) JumpTarget enter() Outputs the code that calls the defined function. Returns a JumpTarget that sets the callee later. If you specify label, you do not need to set it later.        
Jit.Compiler#fastCall(label) JumpTarget Output the code that calls the function defined in fastEnter(). Returns a JumpTarget that sets the callee later.        
             
Jit.Compiler#jmp(label) JumpTarget Output the jmp command. If you specify label, you do not need to set it later.        
Jit.Compiler#ijmp(dst) JumpTarget Outputs the jmp command. dst is an address register or an immediate value.        
             
Jit.Compiler#eq(op1, op2) JumpTarget Output the code to check op1 == op2. Returns a JumpTarget that specifies the jump destination when the condition becomes true.        
Jit.Compiler#neq(op1, op2) JumpTarget op1 != op2 Output the code to check. Returns a JumpTarget that specifies the jump destination when the condition becomes true.        
Jit.Compiler#lt(op1, op2) JumpTarget Output code to check op1 <op2 as unsigned. Returns a JumpTarget that specifies the jump destination when the condition becomes true.        
Jit.Compiler#le(op1, op2) JumpTarget Output code to check op1 <= op2 as unsigned. Returns a JumpTarget that specifies the jump destination when the condition becomes true.        
Jit.Compiler#gt(op1, op2) JumpTarget Output code to check op1 >op2 as unsigned. Returns a JumpTarget that specifies the jump destination when the condition becomes true.        
Jit.Compiler#ge(op1, op2) JumpTarget Output code to check op1 >= op2 as unsigned. Returns a JumpTarget that specifies the jump destination when the condition becomes true.        
Jit.Compiler#slt(op1, op2) JumpTarget Output code to check op1 <op2 as signed. Returns a JumpTarget that specifies the jump destination when the condition becomes true.        
Jit.Compiler#sle(op1, op2) JumpTarget Output code to check op1 <= op2 as signed. Returns a JumpTarget that specifies the jump destination when the condition becomes true.        
Jit.Compiler#sgt(op1, op2) JumpTarget Output code to check op1> op2 as signed. Returns a JumpTarget that specifies the jump destination when the condition becomes true.        
Jit.Compiler#sge(op1, op2) JumpTarget Output code to check op1 >= op2 as signed. Returns a JumpTarget that specifies the jump destination when the condition becomes true.        
Jit.Compiler#generate() JitCode Generate code.        

Jit.Compiler#enter(argType)

The entrance of the function is defined by the enter method, but if argType is not specified, Jit.ArgType.SW_SW_SW is regarded as specified. Up to three arguments (specification) can be specified for each type.

  • SW … Signed Word (64bit)
  • UW … Unsigned Word (64bit)
  • FP … Floating Point (64bit)

As a matter of fact, SW and UW do not change because the bit strings of the received register are the same, but maybe it makes a difference in the future. Note that SW can be omitted from the last argument. So the following all have the same meaning.

  • Jit.ArgType.SW_SW_SW
  • Jit.ArgType.SW_SW
  • Jit.ArgType.SW

The register passed as an argument is fixed and is as follows.

  • Caller
Type First argument Second argument Third argument  
  Integer Jit.R0 Jit.R1 Jit.R2
  Double Jit.FR0 Jit.FR1 Jit.FR2
  • Recipient
Type First argument Second argument Third argument  
Integer Jit.S0 Jit.S1 Jit.S2  
  Double Jit.FS0 Jit.FS1 Jit.FS2

Note that the register set by the caller and the register received by the receiver are different.

ConstTarget

Set the label address with setLabel(). Used when you want to store the label address as an immediate value in a register or memory. Do you have many opportunities to use it? I think that it could be used instead of a jump table, but because the method of making a table is not well prepared, it is not prepared.

By the way, it is possible to set an immediate value with setValue(), but it is possible to use Jit.IMM(100) or Jit.IMM(0.1) even for floating point numbers. There is not much point in using it.

An example of using it for a jump table will be described later.

JumpTarget

Set the address to jump to or call a function with setLabel().

For example, when branching based on the result of comparison, the processing is as follows.

var c = new Jit.Compiler();
// Function entry point.
c.enter();
// S0 register value >= 3
var jump0 = c.ge(Jit.S0, Jit.IMM(3));
... // Code when the condition is false
var jump1 = c.jmp();
var label0 = c.label();
... // Code when the condition is true
var label1 = c.label();
...

jump0.setLabel(label0);
jump1.setLabel(label1);

JitCode

If the code is successfully generated by the generate() method, a JitCode object is returned. The methods of JitCode object are as follows. Note that up to 3 arguments can be specified (specification). Since it is an abstraction assembler, it is a specification required to support various architectures. If necessary, it is necessary to secure a local variable area and pass the start address. Sample will be described later.

Method Overview
JitCode#run(a1, a2, a3) Receives the return value as an Integer.
JitCode#frun(a1, a2, a3) Receives the return value as a Double.
JitCode#dump() Outputs the generated assemble list.

sample

Fibonacci sequence (recursive version)

Now, let’s write the recursive version of the code that calculates the customary Fibonacci sequence. The one originally presented as a sample is exactly the same.

var c = new Jit.Compiler();
var entry1 = c.enter();
    var jump0 = c.ge(Jit.S0, Jit.IMM(3));
    c.ret(Jit.S0);
    var l1 = c.label();
    c.sub(Jit.R0, Jit.S0, Jit.IMM(2));
    c.call(entry1);
    c.mov(Jit.S1, Jit.R0);
    c.sub(Jit.R0, Jit.S0, Jit.IMM(1));
    c.call(entry1);
    c.add(Jit.R0, Jit.R0, Jit.S1);
    c.ret(Jit.R0);

jump0.setLabel(l1);
var code = c.generate();

for (var i = 1; i <= 42; ++i) {
    var tmr = new SystemTimer();
    var r = code.run(i);
    System.println("[%8.3f] fib(%2d) = %d" %tmr.elapsed() %i %r);
}

The results are as follows.

[0.000] fib( 1) = 1
[0.000] fib( 2) = 2
[0.000] fib( 3) = 3
[0.000] fib( 4) = 5
[0.000] fib( 5) = 8
[0.000] fib( 6) = 13
[0.000] fib( 7) = 21
[0.000] fib( 8) = 34
[0.000] fib( 9) = 55
[0.000] fib(10) = 89
[0.000] fib(11) = 144
[0.000] fib(12) = 233
[0.000] fib(13) = 377
[0.000] fib(14) = 610
[0.000] fib(15) = 987
[0.000] fib(16) = 1597
[0.000] fib(17) = 2584
[0.000] fib(18) = 4181
[0.000] fib(19) = 6765
[0.000] fib(20) = 10946
[0.000] fib(21) = 17711
[0.000] fib(22) = 28657
[0.000] fib(23) = 46368
[0.000] fib(24) = 75025
[0.000] fib(25) = 121393
[0.001] fib(26) = 196418
[0.001] fib(27) = 317811
[0.001] fib(28) = 514229
[0.002] fib(29) = 832040
[0.002] fib(30) = 1346269
[0.004] fib(31) = 2178309
[0.006] fib(32) = 3524578
[0.009] fib(33) = 5702887
[0.016] fib(34) = 9227465
[0.035] fib(35) = 14930352
[0.042] fib(36) = 24157817
[0.066] fib(37) = 39088169
[0.119] fib(38) = 63245986
[0.181] fib(39) = 102334155
[0.289] fib(40) = 165580141
[0.476] fib(41) = 267914296
[0.773] fib(42) = 433494437

By the way, I compared the result of fib(42) by measuring it with Ruby, Python, PyPy, PHP, HHVM, Kinx, Kinx(native). In the JIT library version, the above only measures the time of run(), so all processes including script interpretation and JIT code generation are calculated fairly using the user time of the entire process.

It is as follows when arranged in ascending order. Outputting native code directly from JIT is remarkably fast. It was a nice miscalculation that Kinx(native) was faster than PyPy. It’s about the same as HHVM. With script, Ruby became faster. 1.8 I am deeply moved by knowing the times.

Language Edition User Time          
Kinx(Jit-Lib) 0.10.0 0.828          
  HHVM 3.21.0 2.227        
  Kinx(native) 0.10.0 2.250        
  PyPy 5.10.0 3.313        
  PHP 7.2.24 11.422        
  Ruby 2.5.1p57 14.877        
  Kinx 0.10.0 27.478   Python 2.7.15+ 41.125

Click here for the assemble list generated by the JIT library. Although it is different between Windows and Linux, this time it is Linux.

       0: 53 push rbx
       1: 41 57 push r15
       3: 41 56 push r14
       5: 48 8b df mov rbx, rdi
       8: 4c 8b fe mov r15, rsi
       b: 4c 8b f2 mov r14, rdx
       e: 48 83 ec 10 sub rsp, 0x10
      12: 48 83 fb 03 cmp rbx, 0x3
      16: 73 0d jae 0x25
      18: 48 89 d8 mov rax, rbx
      1b: 48 83 c4 10 add rsp, 0x10
      1f: 41 5e pop r14
      21: 41 5f pop r15
      23: 5b pop rbx
      24: c3 ret
      25: 48 8d 43 fe lea rax, [rbx-0x2]
      29: 48 89 fa mov rdx, rdi
      2c: 48 89 c7 mov rdi, rax
      2f: e8 cc ff ff ff call 0x0
      34: 49 89 c7 mov r15, rax
      37: 48 8d 43 ff lea rax, [rbx-0x1]
      3b: 48 89 fa mov rdx, rdi
      3e: 48 89 c7 mov rdi, rax
      41: e8 ba ff ff ff call 0x0
      46: 49 03 c7 add rax, r15
      49: 48 83 c4 10 add rsp, 0x10
      4d: 41 5e pop r14
      4f: 41 5f pop r15
      51: 5b pop rbx
      52: c3 ret

Const example

If you dare write as an example of Const, it looks like this. I’m making a jump table in a local variable, so I don’t like recreating the table every time. It seems likely to be solved if a separate interface that can create the table and pass the address is prepared separately.

var c = new Jit.Compiler();
c.enter();
    c.mov(Jit.R1, Jit.IMM(-1));
    var jump0 = c.slt(Jit.S0, Jit.IMM(0));
    var jump1 = c.sgt(Jit.S0, Jit.IMM(3));
    var const0 = c.makeConst(Jit.VAR(0));
    var const1 = c.makeConst(Jit.VAR(1));
    var const2 = c.makeConst(Jit.VAR(2));
    var const3 = c.makeConst(Jit.VAR(3));
    // Get the address of the local variable at the offset of the S0 register (first argument) and store it in the R0 register.
    c.localp(Jit.R0, Jit.S0);
    // Get the value of the local variable itself.
    c.mov(Jit.R0, Jit.MEM1(Jit.R0));
    // Jump with the contents of local variables as addresses.
    c.ijmp(Jit.R0);
    var l0 = c.label();
    c.mov(Jit.R1, Jit.IMM(102));
    c.ret(Jit.R1);
    var l1 = c.label();
    c.mov(Jit.R1, Jit.IMM(103));
    c.ret(Jit.R1);
    var l2 = c.label();
    c.mov(Jit.R1, Jit.IMM(104));
    c.ret(Jit.R1);
    var l3 = c.label();
    c.mov(Jit.R1, Jit.IMM(105));
    var l4 = c.label();
    c.ret(Jit.R1);

// Set jump address before code generation.
jump0.setLabel(l4);
jump1.setLabel(l4);

var code = c.generate();
// Set const value after code generation.
const0.setLabel(l0);
const1.setLabel(l1);
const2.setLabel(l2);
const3.setLabel(l3);

for (var i = -1; i <5; ++i) {
    var r = code.run(i);
    System.println(r);
}

result.

- 1
102
103
104
105
- 1

The code output looks something like this. I tried it out for the Windows version.

       0: 53 push rbx
       1: 56 push rsi
       2: 57 push rdi
       3: 48 8b d9 mov rbx, rcx
       6: 48 8b f2 mov rsi, rdx
       9: 49 8b f8 mov rdi, r8
       c: 4c 8b 4c 24 b0 mov r9, [rsp-0x50]
      11: 48 83 ec 50 sub rsp, 0x50
      15: 48 c7 c2 ff ff ff ff mov rdx, 0xffffffffffffffff
      1c: 48 83 fb 00 cmp rbx, 0x0
      20: 0f 8c 94 00 00 00 jl 0xba
      26: 48 83 fb 03 cmp rbx, 0x3
      2a: 0f 8f 8a 00 00 00 jg 0xba
      30: 49 b9 95 ff 57 61 89 01 00 00 mov r9, 0x1896157ff95
      3a: 4c 89 4c 24 20 mov [rsp+0x20], r9
      3f: 49 b9 a7 ff 57 61 89 01 00 00 mov r9, 0x1896157ffa7
      49: 4c 89 4c 24 28 mov [rsp+0x28], r9
      4e: 49 b9 b9 ff 57 61 89 01 00 00 mov r9, 0x1896157ffb9
      58: 4c 89 4c 24 30 mov [rsp+0x30], r9
      5d: 49 b9 cb ff 57 61 89 01 00 00 mov r9, 0x1896157ffcb67:   4c 89 4c 24 38                              mov [rsp+0x38], r9
      6c:   48 8d 44 24 20                              lea rax, [rsp+0x20]
      71:   48 6b db 08                                 imul rbx, rbx, 0x8
      75:   48 03 c3                                    add rax, rbx
      78:   48 8b 00                                    mov rax, [rax]
      7b:   ff e0                                       jmp rax
      7d:   48 c7 c2 66 00 00 00                        mov rdx, 0x66
      84:   48 89 d0                                    mov rax, rdx
      87:   48 83 c4 50                                 add rsp, 0x50
      8b:   5f                                          pop rdi
      8c:   5e                                          pop rsi
      8d:   5b                                          pop rbx
      8e:   c3                                          ret
      8f:   48 c7 c2 67 00 00 00                        mov rdx, 0x67
      96:   48 89 d0                                    mov rax, rdx
      99:   48 83 c4 50                                 add rsp, 0x50
      9d:   5f                                          pop rdi
      9e:   5e                                          pop rsi
      9f:   5b                                          pop rbx
      a0:   c3                                          ret
      a1:   48 c7 c2 68 00 00 00                        mov rdx, 0x68
      a8:   48 89 d0                                    mov rax, rdx
      ab:   48 83 c4 50                                 add rsp, 0x50
      af:   5f                                          pop rdi
      b0:   5e                                          pop rsi
      b1:   5b                                          pop rbx
      b2:   c3                                          ret
      b3:   48 c7 c2 69 00 00 00                        mov rdx, 0x69
      ba:   48 89 d0                                    mov rax, rdx
      bd:   48 83 c4 50                                 add rsp, 0x50
      c1:   5f                                          pop rdi
      c2:   5e                                          pop rsi
      c3:   5b                                          pop rbx
      c4:   c3                                          ret

7b 行目の jmp rax がポイント。テーブルを静的に定義できるようになればジャンプテーブルとして機能するようになるかと(今は簡単にできる方法が無い…)。

4 つ以上の引数の例

ちょっと面倒くさいが、4 つ以上引数を渡したい場合は、ローカル変数領域に値を格納し、そのアドレス(ポインタ)を引数として渡す。以下の例では、最初に引数をローカル変数領域にセットするためのフック関数を経由させている。ちなみに、ローカル変数は全て 8 バイトで確保されるため、直接 Jit.MEM1() などでアクセスする場合のオフセットは 8 の倍数でないと合わないので注意。

var c = new Jit.Compiler();
var entry1 = c.enter();
    c.mov(Jit.VAR(0), Jit.S0);
    c.mov(Jit.VAR(1), Jit.IMM(3));
    c.mov(Jit.VAR(2), Jit.IMM(2));
    c.mov(Jit.VAR(3), Jit.IMM(1));
    c.localp(Jit.R0);
    var call1 = c.call();
    c.ret(Jit.R0);
var entry2 = c.enter();
    c.mov(Jit.R1, Jit.S0);
    c.mov(Jit.S0, Jit.MEM1(Jit.R1, 0));
    var jump0 = c.ge(Jit.S0, Jit.MEM1(Jit.R1, 8));
    c.ret(Jit.S0);
    var l1 = c.label();
    c.sub(Jit.R3, Jit.S0, Jit.MEM1(Jit.R1, 16));
    c.mov(Jit.VAR(0), Jit.R3);
    c.mov(Jit.VAR(1), Jit.IMM(3));
    c.mov(Jit.VAR(2), Jit.IMM(2));
    c.mov(Jit.VAR(3), Jit.IMM(1));
    c.localp(Jit.R0);
    c.call(entry2);
    c.mov(Jit.S1, Jit.R0);
    c.sub(Jit.R3, Jit.S0, Jit.MEM1(Jit.R1, 24));
    c.mov(Jit.VAR(0), Jit.R3);
    c.mov(Jit.VAR(1), Jit.IMM(3));
    c.mov(Jit.VAR(2), Jit.IMM(2));
    c.mov(Jit.VAR(3), Jit.IMM(1));
    c.localp(Jit.R0);
    c.call(entry2);
    c.add(Jit.R0, Jit.R0, Jit.S1);
    c.ret(Jit.R0);

jump0.setLabel(l1);
call1.setLabel(entry2);
var code = c.generate();

for (var i = 1; i <= 42; ++i) {
    var tmr = new SystemTimer();
    var r = code.run(i);
    System.println("[%8.3f] fib(%2d) = %d" % tmr.elapsed() % i % r);
}

出力はさっきと同じ。

Double の引数と復帰値

Double 紹介していないのでそれも。こちらもフィボナッチでいきましょう。しかし、俺はフィボナッチ大好きだな。気づかなかったけど。0.1 刻みバージョンです。

var c = new Jit.Compiler();
var entry1 = c.enter(Jit.ArgType.FP);
    c.mov(Jit.FR0, Jit.IMM(0.3));
    var jump0 = c.ge(Jit.FS0, Jit.FR0);
    c.ret(Jit.FS0);
    var l1 = c.label();
    c.mov(Jit.FR0, Jit.IMM(0.2));
    c.sub(Jit.FR0, Jit.FS0, Jit.FR0);
    c.call(entry1);
    c.mov(Jit.FS1, Jit.FR0);
    c.mov(Jit.FR0, Jit.IMM(0.1));
    c.sub(Jit.FR0, Jit.FS0, Jit.FR0);
    c.call(entry1);
    c.add(Jit.FR0, Jit.FR0, Jit.FS1);
    c.ret(Jit.FR0);

jump0.setLabel(l1);
var code = c.generate();

for (var i = 0.1; i < 3.5; i += 0.1) {
    var tmr = new SystemTimer();
    var r = code.frun(i);
    System.println("[%8.3f] fib(%3.1f) = %.1f" % tmr.elapsed() % i % r);
}

浮動小数点数の即値は直接比較メソッドで使えるようにしていないので(すればいいんだけど)一旦レジスタに格納して使う必要がある。

frun() することで Double 値を受け取れる。結果は以下の通り。

[   0.000] fib(0.1) = 0.1
[   0.000] fib(0.2) = 0.2
[   0.000] fib(0.3) = 0.3
[   0.000] fib(0.4) = 0.5
[   0.000] fib(0.5) = 0.8
[   0.000] fib(0.6) = 1.3
[   0.000] fib(0.7) = 2.1
[   0.000] fib(0.8) = 3.4
[   0.000] fib(0.9) = 5.5
[   0.000] fib(1.0) = 8.9[0.000] fib(1.1) = 14.4
[0.000] fib(1.2) = 23.3
[0.000] fib(1.3) = 37.7
[0.000] fib(1.4) = 61.0
[0.000] fib(1.5) = 98.7
[0.000] fib(1.6) = 159.7
[0.000] fib(1.7) = 258.4
[0.000] fib(1.8) = 418.1
[0.000] fib(1.9) = 676.5
[0.000] fib(2.0) = 1094.6
[0.000] fib(2.1) = 1771.1
[0.000] fib(2.2) = 2865.7
[0.000] fib(2.3) = 4636.8
[0.000] fib(2.4) = 7502.5
[0.000] fib(2.5) = 12139.3
[0.001] fib(2.6) = 19641.8
[0.001] fib(2.7) = 31781.1
[0.002] fib(2.8) = 51422.9
[0.003] fib(2.9) = 83204.0
[0.004] fib(3.0) = 134626.9
[0.006] fib(3.1) = 217830.9
[0.015] fib(3.2) = 352457.8
[0.020] fib(3.3) = 570288.7
[0.027] fib(3.4) = 922746.5

The output code is as follows. This is also a Windows version. To pass a floating point number there is first a simple hook function. SLJIT does not allow you to specify a floating point number as an argument at the function entry point, so this is avoided.

In that sense, using SLJIT is more convenient than using SLJIT directly. Because the size required in the local variable area is automatically calculated, and the temporary storage code for non-destructive registers is also automatically calculated so that the necessary number is performed.

       0: 53 push rbx
       1: 56 push rsi
       2: 57 push rdi
       3: 48 8b d9 mov rbx, rcx
       6: 48 8b f2 mov rsi, rdx
       9: 49 8b f8 mov rdi, r8
       c: 4c 8b 4c 24 d0 mov r9, [rsp-0x30]
      11: 48 83 ec 30 sub rsp, 0x30
      15: 0f 29 74 24 20 movaps [rsp+0x20], xmm6
      1a: f2 0f 10 03 movsd xmm0, qword [rbx]
      1e: 48 89 f2 mov rdx, rsi
      21: 49 89 f8 mov r8, rdi
      24: 48 89 c1 mov rcx, rax
      27: e8 0d 00 00 00 call 0x39
      2c: 0f 28 74 24 20 movaps xmm6, [rsp+0x20]
      31: 48 83 c4 30 add rsp, 0x30
      35: 5f pop rdi
      36: 5e pop rsi
      37: 5b pop rbx
      38: c3 ret
      39: 53 push rbx
      3a: 56 push rsi
      3b: 57 push rdi
      3c: 48 8b d9 mov rbx, rcx
      3f: 48 8b f2 mov rsi, rdx
      42: 49 8b f8 mov rdi, r8
      45: 4c 8b 4c 24 b0 mov r9, [rsp-0x50]
      4a: 48 83 ec 50 sub rsp, 0x50
      4e: 0f 29 74 24 20 movaps [rsp+0x20], xmm6
      53: f2 0f 11 6c 24 38 movsd [rsp+0x38], xmm5
      59: f2 0f 10 f0 movsd xmm6, xmm0
      5d: 49 b9 33 33 33 33 33 33 d3 3f mov r9, 0x3fd3333333333333
      67: 4c 89 4c 24 40 mov [rsp+0x40], r9
      6c: f2 0f 10 44 24 40 movsd xmm0, qword [rsp+0x40]
      72: 66 0f 2e f0 ucomisd xmm6, xmm0
      76: 73 17 jae 0x8f
      78: f2 0f 10 c6 movsd xmm0, xmm6
      7c: f2 0f 10 6c 24 38 movsd xmm5, qword [rsp+0x38]
      82: 0f 28 74 24 20 movaps xmm6, [rsp+0x20]
      87: 48 83 c4 50 add rsp, 0x50
      8b: 5f pop rdi
      8c: 5e pop rsi
      8d: 5b pop rbx
      8e: c3 ret
      8f: 49 b9 9a 99 99 99 99 99 c9 3f mov r9, 0x3fc999999999999a
      99: 4c 89 4c 24 40 mov [rsp+0x40], r9
      9e: f2 0f 10 44 24 40 movsd xmm0, qword [rsp+0x40]
      a4: f2 0f 10 e6 movsd xmm4, xmm6
      a8: f2 0f 5c e0 subsd xmm4, xmm0
      ac: f2 0f 11 e0 movsd xmm0, xmm4
      b0: 48 89 c1 mov rcx, rax
      b3: e8 81 ff ff ff call 0x39
      b8: f2 0f 10 e8 movsd xmm5, xmm0
      bc: 49 b9 9a 99 99 99 99 99 b9 3f mov r9, 0x3fb999999999999a
      c6: 4c 89 4c 24 40 mov [rsp+0x40], r9
      cb: f2 0f 10 44 24 40 movsd xmm0, qword [rsp+0x40]d1: f2 0f 10 e6 movsd xmm4, xmm6
      d5: f2 0f 5c e0 subsd xmm4, xmm0
      d9: f2 0f 11 e0 movsd xmm0, xmm4
      dd: 48 89 c1 mov rcx, rax
      e0: e8 54 ff ff ff call 0x39
      e5: f2 0f 58 c5 addsd xmm0, xmm5
      e9: f2 0f 10 6c 24 38 movsd xmm5, qword [rsp+0x38]
      ef: 0f 28 74 24 20 movaps xmm6, [rsp+0x20]
      f4: 48 83 c4 50 add rsp, 0x50
      f8: 5f pop rdi
      f9: 5e pop rsi
      fa: 5b pop rbx
      fb: c3 ret

in conclusion

JIT is interesting. If you implement and combine it with a parser/combinator, you can make a little language processor with JIT. It may be possible to aim for such a road.

There are probably two possible uses:

  1. When Kinx library is created, JIT is used within the range of numerical calculation to speed up.
  2. Hosts DSL (domain specific language) or oleore language and uses it for backend output.

see you.