Kinx Library-JIT Compiler Library

Introduction

** "Looks like JavaScript, brain (contents) is Ruby, (stability is AC / DC)" ** Scripting language Kinx ). I made a library for JIT compilation.

Reference
First motivation ... Script language KINX (introduction)
All links to individual articles are collected here.
Repository ... https://github.com/Kray-G/kinx
We are waiting for Pull Request etc.

I want to do JIT. This time, we made SLJIT, which is also used in Kinx --native, into a library to make it easier to use. Since SLJIT itself has few documents and is used by deciphering it from the source, I thought about writing how to use SLJIT itself as a memorandum, but this time it is reserved. I might do it somewhere.

However, of course it is easier to use than using SLJIT as it is, so I think this is better. ** The host language is also a script **, so you can enjoy it easily.

What is it like?

First, I will give you a sample of what the program will look like. It seems that various details will continue and it will not reach this point. .. ..

using Jit;

var c = new Jit.Compiler();
var entry1 = c.enter();
    var jump0 = c.ge(Jit.S0, Jit.IMM(3));
    c.ret(Jit.S0);
    var l1 = c.label();
    c.sub(Jit.R0, Jit.S0, Jit.IMM(2));
    c.call(entry1);
    c.mov(Jit.S1, Jit.R0);
    c.sub(Jit.R0, Jit.S0, Jit.IMM(1));
    c.call(entry1);
    c.add(Jit.R0, Jit.R0, Jit.S1);
    c.ret(Jit.R0);

jump0.setLabel(l1);
var code = c.generate();

for (var i = 1; i <= 42; ++i) {
    var tmr = new SystemTimer();
    var r = code.run(i);
    System.println("[%8.3f] fib(%2d) = %d" % tmr.elapsed() % i % r);
}

Create a Jit.Compiler object, create a function entry with ʻenter, and write code to play with various registers and ret. So, when you execute it, it becomes generate ()andrun (). You can also see the assemble list by doing generate ()anddump ()`.

If you want to skip various things, go to Sample! → In the sample, we also benchmark with Ruby, Python, and PyPy.

SLJIT

What is SLJIT in the first place?

In a nutshell, ** Abstraction Assembler ** is a library that solves the problem of the assembler that one writing style can support multiple environments, which is different for each CPU and must be recreated. The platforms that are currently supported are as follows.

SLJIT support platform
- Intel-x86 32
- AMD-x86 64
- ARM 32 (ARM-v5, ARM-v7 and Thumb2 instruction sets)
- ARM 64
- PowerPC 32
- PowerPC 64
- MIPS 32 (III, R1)
- MIPS 64 (III, R1)
- SPARC 32

However, please note that the Kinx version of the JIT library introduced here only supports 64bit, and we have only confirmed (made) x64 Windows and x64 Linux.

Official? Explanatory document

As far as I know, I found only the following helpful documents.

https://zherczeg.github.io/sljit/
http://ftp.jaist.ac.jp/pub/NetBSD/NetBSD-current/src/sys/external/bsd/sljit/dist/doc/tutorial/sljit_tutorial.html

It will be helpful.

The GitHub repository is below.

https://github.com/zherczeg/sljit

Jit

Now, the JIT library as a Kinx library. It is more convenient than using it as C. Of course, you can use the C library for more control, but you can do it.

using Jit

The Jit library is not built-in, so use the using directive to load it explicitly.

using Jit;

Jit object

The Jit object defines methods for parameters and compiler classes.

Method for Jit parameter

There are three types of Jit parameters: immediate value, register, and memory access. It is used in the following form.

Immediate value, memory access

Immediate value and memory access are used in the following methods. Jit.VAR () is a special method for using local variable regions. A local variable area is automatically allocated in the stack area, and that area is used.

Method	Remarks
`Jit.IMM(v)`	Write the same way for both 64-bit integers and floating-point numbers. Match with the register of the assignment destination.
`Jit.VAR(n)`	Local variable area. 1 variable fixed to 8 bytes.
`Jit.MEM0(address)`	Substitute an immediate value as address, but cannot be used from a script because the current real address cannot be specified from the script.
`Jit.MEM1(r1, offset)`	The register specified in r1 is regarded as an address, and the memory address of the offset position (in bytes) is shown.
`Jit.MEM2(r1, r2, shift)`	shift is 0 for 1 byte, 1 for 2 bytes, 2 for 4 bytes, 3 for 8 bytes,`r1 + r2 * (Bytes indicated by shift)`Indicates the memory address at the location of.

register

The following registers can be used. The number of registers that can be used in a function is calculated automatically and changes for each function (range separated by ʻenter ()`).

register	Use
`Jit.R0` ～ `Jit.R5`	General-purpose register. Temporarily used. It may be discarded after calling another function.
`Jit.S0` ～ `Jit.S5`	General-purpose register. Guarantee that it will not be destroyed after calling another function.
`Jit.FR0` ～ `Jit.FR5`	Floating point register. Temporarily used. It may be discarded after calling another function.
`Jit.FS0` ～ `Jit.FS5`	Floating point register. Guarantee that it will not be destroyed after calling another function.

Since there are a maximum of 6 registers for Floating Point in total for FR / FS, only FS0 can be used when using up to FR4. If you use up to FR5, you cannot use allFS *. Please note that it looks like the following.

`FR*`register	`FS*`register
(Not available)	`FS0`, `FS1`, `FS2`, `FS3`, `FS4`, `FS5`
`FR0`	`FS0`, `FS1`, `FS2`, `FS3`, `FS4`
`FR0`, `FR1`	`FS0`, `FS1`, `FS2`, `FS3`
`FR0`, `FR1`, `FR2`	`FS0`, `FS1`, `FS2`
`FR0`, `FR1`, `FR2`, `FR3`	`FS0`, `FS1`
`FR0`, `FR1`, `FR2`, `FR3`, `FR4`	`FS0`
`FR0`, `FR1`, `FR2`, `FR3`, `FR4`, `FR5`	(Not available)

Jit compiler

To create a Jit instruction, create a Jit compiler object.

var c = new Jit.Compiler();

The Jit compiler has the following methods.

Jit compiler method	Return value	Overview
`Jit.Compiler#label()`	label	Add a label to the current location.
`Jit.Compiler#makeConst(reg, init)`	ConstTarget	Outputs a tentative definition code for setting the immediate value after code generation.

`Jit.Compiler#localp(dst, offset)`		Output the code to get the real address of the local variable.`dst`It is stored in the register shown in. offset is the local variable number.

`Jit.Compiler#enter(argType)`	label	Create a function entrance. Argument type can be specified (optional).
`Jit.Compiler#fastEnter(reg)`	label	Create a function entrance. However, no extra epilogue or prologue is output, and the return address is output.`reg`Save to.

`Jit.Compiler#ret(val)`		Output the Return code.`val`return it.`val`Is a floating point number`FR0`Registers, otherwise`R0`Returned at the register.

`Jit.Compiler#f2i(dst, op1)`		double int64_Output the code to cast to t.`dst`Is a general-purpose register.`op1`Is a floating point register.
`Jit.Compiler#i2f(dst, op1)`		int64_Output the code that casts t to a double.`dst`Is a floating point register.`op1`Is a general-purpose register.

`Jit.Compiler#mov(dst, op1)`		`dst`To`op1`Output the code to substitute for. Floating point and other types are automatically recognized.

`Jit.Compiler#neg(dst, op1)`		`op1`The result of sign inversion of`dst`Output the code to be stored in.
`Jit.Compiler#clz(dst, op1)`		`op1`Count the number of bits that are 0 from the beginning of`dst`Output the code to be stored in.
`Jit.Compiler#add(dst, op1, op2)`		`op1`When`op2`The result of adding`dst`Output the code to be stored in.
`Jit.Compiler#sub(dst, op1, op2)`		`op1`When`op2`The result of subtracting`dst`Output the code to be stored in.
`Jit.Compiler#mul(dst, op1, op2)`		`op1`When`op2`The result of multiplying by`dst`Output the code to be stored in.
`Jit.Compiler#div(dst, op1, op2)`		Floating point numbers only,`op1`When`op2`The result of dividing`dst`Output the code to be stored in.
`Jit.Compiler#div()`		The value divided by the general-purpose register as unsigned`R0`Output the code to be stored in the register.
`Jit.Compiler#sdiv()`		The value divided by the general-purpose register as signed`R0`Output the code to be stored in the register.
`Jit.Compiler#divmod()`		The value divided by the general-purpose register as unsigned`R0`Store in a register and leave the remainder`R1`Output the code to be stored in the register.
`Jit.Compiler#sdivmod()`		The value divided by the general-purpose register as signed`R0`Store in a register and leave the remainder`R1`Output the code to be stored in the register.

`Jit.Compiler#not(dst, op1)`		`op1`The result of bit inversion of`dst`Output the code to be stored in.
`Jit.Compiler#and(dst, op1, op2)`		`op1`When`op2`Bit AND value with`dst`Output the code to be stored in.
`Jit.Compiler#or(dst, op1, op2)`		`op1`When`op2`Bit OR value in`dst`Output the code to be stored in.
`Jit.Compiler#xor(dst, op1, op2)`		`op1`When`op2`Bit XORed by`dst`Output the code to be stored in.
`Jit.Compiler#shl(dst, op1, op2)`		`op1`To`op2`The value shifted to the left by the bit`dst`Output the code to be stored in.
`Jit.Compiler#lshr(dst, op1, op2)`		`op1`To`op2`The value shifted logically to the right by the bit`dst`Output the code to be stored in.
`Jit.Compiler#ashr(dst, op1, op2)`		`op1`To`op2`Bits, arithmetic right-shifted values`dst`Output the code to be stored in.

`Jit.Compiler#call(label)`	JumpTarget	`enter()`Output the code that makes the defined function call. Returns a JumpTarget that later sets the callee.`label`If is specified, there is no need to set it later.
`Jit.Compiler#fastCall(label)`	JumpTarget	`fastEnter()`Output the code that calls the function defined in. Returns a JumpTarget that later sets the callee.

`Jit.Compiler#jmp(label)`	JumpTarget	`jmp`Output the command.`label`If is specified, there is no need to set it later.
`Jit.Compiler#ijmp(dst)`	JumpTarget	`jmp`Output the command.`dst`Is a register indicating the address, or an immediate value.

`Jit.Compiler#eq(op1, op2)`	JumpTarget	`op1 == op2`Output the code to confirm. Returns a JumpTarget that specifies the jump destination if the condition is true.
`Jit.Compiler#neq(op1, op2)`	JumpTarget	`op1 != op2`Output the code to confirm. Returns a JumpTarget that specifies the jump destination if the condition is true.
`Jit.Compiler#lt(op1, op2)`	JumpTarget	As unsigned`op1 < op2`Output the code to confirm. Returns a JumpTarget that specifies the jump destination if the condition is true.
`Jit.Compiler#le(op1, op2)`	JumpTarget	As unsigned`op1 <= op2`Output the code to confirm. Returns a JumpTarget that specifies the jump destination if the condition is true.
`Jit.Compiler#gt(op1, op2)`	JumpTarget	As unsigned`op1 > op2`Output the code to confirm. Returns a JumpTarget that specifies the jump destination if the condition is true.
`Jit.Compiler#ge(op1, op2)`	JumpTarget	As unsigned`op1 >= op2`Output the code to confirm. Returns a JumpTarget that specifies the jump destination if the condition is true.
`Jit.Compiler#slt(op1, op2)`	JumpTarget	As signed`op1 < op2`Output the code to confirm. Returns a JumpTarget that specifies the jump destination if the condition is true.
`Jit.Compiler#sle(op1, op2)`	JumpTarget	As signed`op1 <= op2`Output the code to confirm. Returns a JumpTarget that specifies the jump destination if the condition is true.
`Jit.Compiler#sgt(op1, op2)`	JumpTarget	As signed`op1 > op2`Output the code to confirm. Returns a JumpTarget that specifies the jump destination if the condition is true.
`Jit.Compiler#sge(op1, op2)`	JumpTarget	As signed`op1 >= op2`Output the code to confirm. Returns a JumpTarget that specifies the jump destination if the condition is true.

`Jit.Compiler#generate()`	JitCode	Generate code.

Jit.Compiler#enter(argType)

The entrance of the function is defined by the ʻenter method, but if ʻargType is not specified, it is considered that Jit.ArgType.SW_SW_SW is specified. Up to 3 arguments (specification) specify each type.

SW ... Signed Word (64bit)
UW ... Unsigned Word (64bit)
FP ... Floating Point (64bit)

As a matter of fact, SW and ʻUWdo not change because the received register bits are the same, but it may make some difference in the future. Note thatSW` can be omitted from the last argument. So the following all have the same meaning.

Jit.ArgType.SW_SW_SW
Jit.ArgType.SW_SW
Jit.ArgType.SW

The register passed as an argument is fixed and is as follows.

Caller

Mold	1st argument	2nd argument	3rd argument
Integer	`Jit.R0`	`Jit.R1`	`Jit.R2`
Double	`Jit.FR0`	`Jit.FR1`	`Jit.FR2`

Recipient

Mold	1st argument	2nd argument	3rd argument
Integer	`Jit.S0`	`Jit.S1`	`Jit.S2`
Double	`Jit.FS0`	`Jit.FS1`	`Jit.FS2`

Note that the register set by the caller and the register received by the receiver are different.

ConstTarget

Set the label address with setLabel (). It is used when you want to store the label address as an immediate value in a register or memory. Do you have a lot of opportunities to use it? I think it could be a substitute for a jump table, but I haven't prepared a good mechanism for making a table.

By the way, you can set the immediate value with setValue (), but I made it possible to normally use Jit.IMM (100) or even floating point numbers such as Jit.IMM (0.1). There is not much point in using it.

An example of using it for a jump table will be described later.

JumpTarget

Set the jump destination or the address for the function call with setLabel ().

For example, when branching based on the comparison result, it becomes as follows.

var c = new Jit.Compiler();
//Function entry point.
c.enter();
//S0 register value>= 3
var jump0 = c.ge(Jit.S0, Jit.IMM(3));
... //Code when the condition is false
var jump1 = c.jmp();
var label0 = c.label();
... //Code when the condition is true
var label1 = c.label();
...

jump0.setLabel(label0);
jump1.setLabel(label1);

JitCode

If the code generation is successful with the generate () method, a JitCode object is returned. The methods of the JitCode object are as follows. Note that you can only specify up to 3 arguments (specification). Since it is an abstraction assembler, it is a specification required to support various architectures. If necessary, it is necessary to secure a local variable area and pass the start address of the local variable area. A sample will be described later.

Method	Overview
`JitCode#run(a1, a2, a3)`	Receives the return value as an Integer.
`JitCode#frun(a1, a2, a3)`	Receive the return value as Double.
`JitCode#dump()`	Output the generated assemble list.

sample

Fibonacci sequence (recursive version)

Now let's write a recursive version of the code that calculates the customary Fibonacci sequence. It is the same as the one originally presented as a sample.

var c = new Jit.Compiler();
var entry1 = c.enter();
    var jump0 = c.ge(Jit.S0, Jit.IMM(3));
    c.ret(Jit.S0);
    var l1 = c.label();
    c.sub(Jit.R0, Jit.S0, Jit.IMM(2));
    c.call(entry1);
    c.mov(Jit.S1, Jit.R0);
    c.sub(Jit.R0, Jit.S0, Jit.IMM(1));
    c.call(entry1);
    c.add(Jit.R0, Jit.R0, Jit.S1);
    c.ret(Jit.R0);

jump0.setLabel(l1);
var code = c.generate();

for (var i = 1; i <= 42; ++i) {
    var tmr = new SystemTimer();
    var r = code.run(i);
    System.println("[%8.3f] fib(%2d) = %d" % tmr.elapsed() % i % r);
}

The result is as follows.

[   0.000] fib( 1) = 1
[   0.000] fib( 2) = 2
[   0.000] fib( 3) = 3
[   0.000] fib( 4) = 5
[   0.000] fib( 5) = 8
[   0.000] fib( 6) = 13
[   0.000] fib( 7) = 21
[   0.000] fib( 8) = 34
[   0.000] fib( 9) = 55
[   0.000] fib(10) = 89
[   0.000] fib(11) = 144
[   0.000] fib(12) = 233
[   0.000] fib(13) = 377
[   0.000] fib(14) = 610
[   0.000] fib(15) = 987
[   0.000] fib(16) = 1597
[   0.000] fib(17) = 2584
[   0.000] fib(18) = 4181
[   0.000] fib(19) = 6765
[   0.000] fib(20) = 10946
[   0.000] fib(21) = 17711
[   0.000] fib(22) = 28657
[   0.000] fib(23) = 46368
[   0.000] fib(24) = 75025
[   0.000] fib(25) = 121393
[   0.001] fib(26) = 196418
[   0.001] fib(27) = 317811
[   0.001] fib(28) = 514229
[   0.002] fib(29) = 832040
[   0.002] fib(30) = 1346269
[   0.004] fib(31) = 2178309
[   0.006] fib(32) = 3524578
[   0.009] fib(33) = 5702887
[   0.016] fib(34) = 9227465
[   0.035] fib(35) = 14930352
[   0.042] fib(36) = 24157817
[   0.066] fib(37) = 39088169
[   0.119] fib(38) = 63245986
[   0.181] fib(39) = 102334155
[   0.289] fib(40) = 165580141
[   0.476] fib(41) = 267914296
[   0.773] fib(42) = 433494437

By the way, I measured the result of fib (42) with Ruby, Python, PyPy, PHP, HHVM, Kinx, Kinx (native) and compared them. Since the JIT library version only measures the time of run () above, everything including script interpretation and JIT code generation is calculated fairly by the user time of the entire process.

It is as follows when arranged in order of speed. After all, it is remarkably fast when the native code is output directly by JIT. It's a nice miscalculation that Kinx (native) was faster than PyPy. How much is HHVM? Ruby is faster in scripts. I'm deeply moved when I know the 1.8 era.

language	Version number	User time
Kinx(Jit-Lib)	0.10.0	0.828
HHVM	3.21.0	2.227
Kinx(native)	0.10.0	2.250
PyPy	5.10.0	3.313
PHP	7.2.24	11.422
Ruby	2.5.1p57	14.877
Kinx	0.10.0	27.478
Python	2.7.15+	41.125

Click here for the assemble list generated by the JIT library. It's different between Windows and Linux, but this time it's Linux.

       0:   53                                          push rbx
       1:   41 57                                       push r15
       3:   41 56                                       push r14
       5:   48 8b df                                    mov rbx, rdi
       8:   4c 8b fe                                    mov r15, rsi
       b:   4c 8b f2                                    mov r14, rdx
       e:   48 83 ec 10                                 sub rsp, 0x10
      12:   48 83 fb 03                                 cmp rbx, 0x3
      16:   73 0d                                       jae 0x25
      18:   48 89 d8                                    mov rax, rbx
      1b:   48 83 c4 10                                 add rsp, 0x10
      1f:   41 5e                                       pop r14
      21:   41 5f                                       pop r15
      23:   5b                                          pop rbx
      24:   c3                                          ret
      25:   48 8d 43 fe                                 lea rax, [rbx-0x2]
      29:   48 89 fa                                    mov rdx, rdi
      2c:   48 89 c7                                    mov rdi, rax
      2f:   e8 cc ff ff ff                              call 0x0
      34:   49 89 c7                                    mov r15, rax
      37:   48 8d 43 ff                                 lea rax, [rbx-0x1]
      3b:   48 89 fa                                    mov rdx, rdi
      3e:   48 89 c7                                    mov rdi, rax
      41:   e8 ba ff ff ff                              call 0x0
      46:   49 03 c7                                    add rax, r15
      49:   48 83 c4 10                                 add rsp, 0x10
      4d:   41 5e                                       pop r14
      4f:   41 5f                                       pop r15
      51:   5b                                          pop rbx
      52:   c3                                          ret

Const example

As an example of Const, if you dare to write it, it looks like this. I'm making a jump table for local variables, so I'm not good at recreating the table every time. It seems that it will be solved if you prepare a separate interface that allows you to create only a table and pass the address (maybe).

var c = new Jit.Compiler();
c.enter();
    c.mov(Jit.R1, Jit.IMM(-1));
    var jump0 = c.slt(Jit.S0, Jit.IMM(0));
    var jump1 = c.sgt(Jit.S0, Jit.IMM(3));
    var const0 = c.makeConst(Jit.VAR(0));
    var const1 = c.makeConst(Jit.VAR(1));
    var const2 = c.makeConst(Jit.VAR(2));
    var const3 = c.makeConst(Jit.VAR(3));
    //The address of the local variable is acquired by the offset of the S0 register (first argument) and stored in the R0 register.
    c.localp(Jit.R0, Jit.S0);
    //Get the value of a local variable itself.
    c.mov(Jit.R0, Jit.MEM1(Jit.R0));
    //Jump by regarding the contents of local variables as addresses.
    c.ijmp(Jit.R0);
    var l0 = c.label();
    c.mov(Jit.R1, Jit.IMM(102));
    c.ret(Jit.R1);
    var l1 = c.label();
    c.mov(Jit.R1, Jit.IMM(103));
    c.ret(Jit.R1);
    var l2 = c.label();
    c.mov(Jit.R1, Jit.IMM(104));
    c.ret(Jit.R1);
    var l3 = c.label();
    c.mov(Jit.R1, Jit.IMM(105));
    var l4 = c.label();
    c.ret(Jit.R1);

//The jump address is set before code generation.
jump0.setLabel(l4);
jump1.setLabel(l4);

var code = c.generate();
//The const value is set after code generation.
const0.setLabel(l0);
const1.setLabel(l1);
const2.setLabel(l2);
const3.setLabel(l3);

for (var i = -1; i < 5; ++i) {
    var r = code.run(i);
    System.println(r);
}

result.

The code output looks like this. I tried this on the Windows version.

       0:   53                                          push rbx
       1:   56                                          push rsi
       2:   57                                          push rdi
       3:   48 8b d9                                    mov rbx, rcx
       6:   48 8b f2                                    mov rsi, rdx
       9:   49 8b f8                                    mov rdi, r8
       c:   4c 8b 4c 24 b0                              mov r9, [rsp-0x50]
      11:   48 83 ec 50                                 sub rsp, 0x50
      15:   48 c7 c2 ff ff ff ff                        mov rdx, 0xffffffffffffffff
      1c:   48 83 fb 00                                 cmp rbx, 0x0
      20:   0f 8c 94 00 00 00                           jl 0xba
      26:   48 83 fb 03                                 cmp rbx, 0x3
      2a:   0f 8f 8a 00 00 00                           jg 0xba
      30:   49 b9 95 ff 57 61 89 01 00 00               mov r9, 0x1896157ff95
      3a:   4c 89 4c 24 20                              mov [rsp+0x20], r9
      3f:   49 b9 a7 ff 57 61 89 01 00 00               mov r9, 0x1896157ffa7
      49:   4c 89 4c 24 28                              mov [rsp+0x28], r9
      4e:   49 b9 b9 ff 57 61 89 01 00 00               mov r9, 0x1896157ffb9
      58:   4c 89 4c 24 30                              mov [rsp+0x30], r9
      5d:   49 b9 cb ff 57 61 89 01 00 00               mov r9, 0x1896157ffcb
      67:   4c 89 4c 24 38                              mov [rsp+0x38], r9
      6c:   48 8d 44 24 20                              lea rax, [rsp+0x20]
      71:   48 6b db 08                                 imul rbx, rbx, 0x8
      75:   48 03 c3                                    add rax, rbx
      78:   48 8b 00                                    mov rax, [rax]
      7b:   ff e0                                       jmp rax
      7d:   48 c7 c2 66 00 00 00                        mov rdx, 0x66
      84:   48 89 d0                                    mov rax, rdx
      87:   48 83 c4 50                                 add rsp, 0x50
      8b:   5f                                          pop rdi
      8c:   5e                                          pop rsi
      8d:   5b                                          pop rbx
      8e:   c3                                          ret
      8f:   48 c7 c2 67 00 00 00                        mov rdx, 0x67
      96:   48 89 d0                                    mov rax, rdx
      99:   48 83 c4 50                                 add rsp, 0x50
      9d:   5f                                          pop rdi
      9e:   5e                                          pop rsi
      9f:   5b                                          pop rbx
      a0:   c3                                          ret
      a1:   48 c7 c2 68 00 00 00                        mov rdx, 0x68
      a8:   48 89 d0                                    mov rax, rdx
      ab:   48 83 c4 50                                 add rsp, 0x50
      af:   5f                                          pop rdi
      b0:   5e                                          pop rsi
      b1:   5b                                          pop rbx
      b2:   c3                                          ret
      b3:   48 c7 c2 69 00 00 00                        mov rdx, 0x69
      ba:   48 89 d0                                    mov rax, rdx
      bd:   48 83 c4 50                                 add rsp, 0x50
      c1:   5f                                          pop rdi
      c2:   5e                                          pop rsi
      c3:   5b                                          pop rbx
      c4:   c3                                          ret

The point is jmp rax on line 7b. If the table can be defined statically, it will function as a jump table (there is no easy way to do it now ...).

Example of 4 or more arguments

It's a little annoying, but if you want to pass 4 or more arguments, store the value in the local variable area and pass the address (pointer) as an argument. In the following example, the argument is first passed through the hook function for setting the argument in the local variable area. By the way, since all local variables are allocated in 8 bytes, note that the offset when accessing directly with Jit.MEM1 () etc. must be a multiple of 8.

var c = new Jit.Compiler();
var entry1 = c.enter();
    c.mov(Jit.VAR(0), Jit.S0);
    c.mov(Jit.VAR(1), Jit.IMM(3));
    c.mov(Jit.VAR(2), Jit.IMM(2));
    c.mov(Jit.VAR(3), Jit.IMM(1));
    c.localp(Jit.R0);
    var call1 = c.call();
    c.ret(Jit.R0);
var entry2 = c.enter();
    c.mov(Jit.R1, Jit.S0);
    c.mov(Jit.S0, Jit.MEM1(Jit.R1, 0));
    var jump0 = c.ge(Jit.S0, Jit.MEM1(Jit.R1, 8));
    c.ret(Jit.S0);
    var l1 = c.label();
    c.sub(Jit.R3, Jit.S0, Jit.MEM1(Jit.R1, 16));
    c.mov(Jit.VAR(0), Jit.R3);
    c.mov(Jit.VAR(1), Jit.IMM(3));
    c.mov(Jit.VAR(2), Jit.IMM(2));
    c.mov(Jit.VAR(3), Jit.IMM(1));
    c.localp(Jit.R0);
    c.call(entry2);
    c.mov(Jit.S1, Jit.R0);
    c.sub(Jit.R3, Jit.S0, Jit.MEM1(Jit.R1, 24));
    c.mov(Jit.VAR(0), Jit.R3);
    c.mov(Jit.VAR(1), Jit.IMM(3));
    c.mov(Jit.VAR(2), Jit.IMM(2));
    c.mov(Jit.VAR(3), Jit.IMM(1));
    c.localp(Jit.R0);
    c.call(entry2);
    c.add(Jit.R0, Jit.R0, Jit.S1);
    c.ret(Jit.R0);

jump0.setLabel(l1);
call1.setLabel(entry2);
var code = c.generate();

for (var i = 1; i <= 42; ++i) {
    var tmr = new SystemTimer();
    var r = code.run(i);
    System.println("[%8.3f] fib(%2d) = %d" % tmr.elapsed() % i % r);
}

The output is the same as before.

Double argument and return value

I haven't introduced Double, so that too. Let's go with Fibonacci as well. But I love Fibonacci. I didn't notice it. It is a 0.1 step version.

var c = new Jit.Compiler();
var entry1 = c.enter(Jit.ArgType.FP);
    c.mov(Jit.FR0, Jit.IMM(0.3));
    var jump0 = c.ge(Jit.FS0, Jit.FR0);
    c.ret(Jit.FS0);
    var l1 = c.label();
    c.mov(Jit.FR0, Jit.IMM(0.2));
    c.sub(Jit.FR0, Jit.FS0, Jit.FR0);
    c.call(entry1);
    c.mov(Jit.FS1, Jit.FR0);
    c.mov(Jit.FR0, Jit.IMM(0.1));
    c.sub(Jit.FR0, Jit.FS0, Jit.FR0);
    c.call(entry1);
    c.add(Jit.FR0, Jit.FR0, Jit.FS1);
    c.ret(Jit.FR0);

jump0.setLabel(l1);
var code = c.generate();

for (var i = 0.1; i < 3.5; i += 0.1) {
    var tmr = new SystemTimer();
    var r = code.frun(i);
    System.println("[%8.3f] fib(%3.1f) = %.1f" % tmr.elapsed() % i % r);
}

Since the immediate value of the floating point number is not made available in the direct comparison method (it should be done), it needs to be temporarily stored in the register and used.

You can receive a Double value by doing frun (). The result is as follows.

[   0.000] fib(0.1) = 0.1
[   0.000] fib(0.2) = 0.2
[   0.000] fib(0.3) = 0.3
[   0.000] fib(0.4) = 0.5
[   0.000] fib(0.5) = 0.8
[   0.000] fib(0.6) = 1.3
[   0.000] fib(0.7) = 2.1
[   0.000] fib(0.8) = 3.4
[   0.000] fib(0.9) = 5.5
[   0.000] fib(1.0) = 8.9
[   0.000] fib(1.1) = 14.4
[   0.000] fib(1.2) = 23.3
[   0.000] fib(1.3) = 37.7
[   0.000] fib(1.4) = 61.0
[   0.000] fib(1.5) = 98.7
[   0.000] fib(1.6) = 159.7
[   0.000] fib(1.7) = 258.4
[   0.000] fib(1.8) = 418.1
[   0.000] fib(1.9) = 676.5
[   0.000] fib(2.0) = 1094.6
[   0.000] fib(2.1) = 1771.1
[   0.000] fib(2.2) = 2865.7
[   0.000] fib(2.3) = 4636.8
[   0.000] fib(2.4) = 7502.5
[   0.000] fib(2.5) = 12139.3
[   0.001] fib(2.6) = 19641.8
[   0.001] fib(2.7) = 31781.1
[   0.002] fib(2.8) = 51422.9
[   0.003] fib(2.9) = 83204.0
[   0.004] fib(3.0) = 134626.9
[   0.006] fib(3.1) = 217830.9
[   0.015] fib(3.2) = 352457.8
[   0.020] fib(3.3) = 570288.7
[   0.027] fib(3.4) = 922746.5

The output code is as follows. This is also the Windows version. To pass a floating point number, there is a simple hook function first. SLJIT does not allow you to specify a floating point number as an argument at the entry point of the function, so this is avoided.

In that sense as well, using this one is better than using SLJIT directly. Because the required size is automatically calculated in the local variable area, and the necessary number of temporary storage codes for non-destructive registers are also calculated automatically.

       0:   53                                          push rbx
       1:   56                                          push rsi
       2:   57                                          push rdi
       3:   48 8b d9                                    mov rbx, rcx
       6:   48 8b f2                                    mov rsi, rdx
       9:   49 8b f8                                    mov rdi, r8
       c:   4c 8b 4c 24 d0                              mov r9, [rsp-0x30]
      11:   48 83 ec 30                                 sub rsp, 0x30
      15:   0f 29 74 24 20                              movaps [rsp+0x20], xmm6
      1a:   f2 0f 10 03                                 movsd xmm0, qword [rbx]
      1e:   48 89 f2                                    mov rdx, rsi
      21:   49 89 f8                                    mov r8, rdi
      24:   48 89 c1                                    mov rcx, rax
      27:   e8 0d 00 00 00                              call 0x39
      2c:   0f 28 74 24 20                              movaps xmm6, [rsp+0x20]
      31:   48 83 c4 30                                 add rsp, 0x30
      35:   5f                                          pop rdi
      36:   5e                                          pop rsi
      37:   5b                                          pop rbx
      38:   c3                                          ret
      39:   53                                          push rbx
      3a:   56                                          push rsi
      3b:   57                                          push rdi
      3c:   48 8b d9                                    mov rbx, rcx
      3f:   48 8b f2                                    mov rsi, rdx
      42:   49 8b f8                                    mov rdi, r8
      45:   4c 8b 4c 24 b0                              mov r9, [rsp-0x50]
      4a:   48 83 ec 50                                 sub rsp, 0x50
      4e:   0f 29 74 24 20                              movaps [rsp+0x20], xmm6
      53:   f2 0f 11 6c 24 38                           movsd [rsp+0x38], xmm5
      59:   f2 0f 10 f0                                 movsd xmm6, xmm0
      5d:   49 b9 33 33 33 33 33 33 d3 3f               mov r9, 0x3fd3333333333333
      67:   4c 89 4c 24 40                              mov [rsp+0x40], r9
      6c:   f2 0f 10 44 24 40                           movsd xmm0, qword [rsp+0x40]
      72:   66 0f 2e f0                                 ucomisd xmm6, xmm0
      76:   73 17                                       jae 0x8f
      78:   f2 0f 10 c6                                 movsd xmm0, xmm6
      7c:   f2 0f 10 6c 24 38                           movsd xmm5, qword [rsp+0x38]
      82:   0f 28 74 24 20                              movaps xmm6, [rsp+0x20]
      87:   48 83 c4 50                                 add rsp, 0x50
      8b:   5f                                          pop rdi
      8c:   5e                                          pop rsi
      8d:   5b                                          pop rbx
      8e:   c3                                          ret
      8f:   49 b9 9a 99 99 99 99 99 c9 3f               mov r9, 0x3fc999999999999a
      99:   4c 89 4c 24 40                              mov [rsp+0x40], r9
      9e:   f2 0f 10 44 24 40                           movsd xmm0, qword [rsp+0x40]
      a4:   f2 0f 10 e6                                 movsd xmm4, xmm6
      a8:   f2 0f 5c e0                                 subsd xmm4, xmm0
      ac:   f2 0f 11 e0                                 movsd xmm0, xmm4
      b0:   48 89 c1                                    mov rcx, rax
      b3:   e8 81 ff ff ff                              call 0x39
      b8:   f2 0f 10 e8                                 movsd xmm5, xmm0
      bc:   49 b9 9a 99 99 99 99 99 b9 3f               mov r9, 0x3fb999999999999a
      c6:   4c 89 4c 24 40                              mov [rsp+0x40], r9
      cb:   f2 0f 10 44 24 40                           movsd xmm0, qword [rsp+0x40]
      d1:   f2 0f 10 e6                                 movsd xmm4, xmm6
      d5:   f2 0f 5c e0                                 subsd xmm4, xmm0
      d9:   f2 0f 11 e0                                 movsd xmm0, xmm4
      dd:   48 89 c1                                    mov rcx, rax
      e0:   e8 54 ff ff ff                              call 0x39
      e5:   f2 0f 58 c5                                 addsd xmm0, xmm5
      e9:   f2 0f 10 6c 24 38                           movsd xmm5, qword [rsp+0x38]
      ef:   0f 28 74 24 20                              movaps xmm6, [rsp+0x20]
      f4:   48 83 c4 50                                 add rsp, 0x50
      f8:   5f                                          pop rdi
      f9:   5e                                          pop rsi
      fa:   5b                                          pop rbx
      fb:   c3                                          ret

in conclusion

JIT is interesting. If you implement and combine it with a parser combinator, you can create a little language processing system with JIT. Maybe you can aim for such a path.

Perhaps there are two possible uses:

When creating a Kinx library, JIT it in the range of numerical calculation etc. to speed it up.
Hosts a DSL (domain specific language) or oleore language and uses it for backend output.

see you.