I implemented a compiler that converts Lisp to C (sometimes called a transpiler, but hereafter referred to as a compiler) in Go.
The deliverables are available on PLC.
In this article, I have summarized the process of implementing a Lisp compiler that handles only integers and being able to calculate Fibonacci numbers.
** Disclaimer ** I tried to make it with Go, but Go does not appear at all in the main story.
It's not difficult to make a simple compiler
It seems difficult to make your own compiler, but if it's not that difficult, maybe you can do it.
However, it is too reckless to make a compiler even though I am not familiar with C language itself.
Is there a language that seems to be less difficult?
So Lisp.
It's full of parentheses, but it's easy to parse.
In addition, the minimum required elements
All you need is this. Only nine. It's wonderful to be able to grasp the whole picture. This is not enough for debugging, so let's add print and progn.
Now, let's decide the output format.
8cc spits out the assembly, but [Human Resource Machine](https://store.steampowered.com/app/375820/Human_Resource_Machine/?l=japanese I only have the knowledge of playing with), so it seems better to stop.
Consider LLVM IR as a better candidate than assembly. It is troublesome to save the execution results one by one, but it seems that there is also a structure.
I tried to implement print and it was done. This may be possible.
Just in case, I will also try the gentler C output.
When I tried it, it was overwhelmingly convenient, and the amount of code was less than half of the LLVM IR output. I don't have that much strength Let's implement it with C output for the time being, and let's say that IR output is a development issue (not done here).
First, try compiling by hand to find out if there are any problems with the implementation policy or not, and then incorporate it into the compiler implementation.
Taking print
as an example, the input Lisp code
(print 1 2 3)
Then, the output C code is
#include<stdio.h>
int main(void) {
printf("%d %d %d\n", 1, 2, 3);
}
It will be. There is no problem with template output except for the 4th line, so there is nothing to think about.
The key to implementing print
is to arrange the format specifications according to the arguments, so you can implement it quickly.
This alone is obviously lacking in functionality, so let's create something like print
after doing something.
For example, if you print
the result of 1 + 2
,
(print (+ 1 2) (+ 3 4 5))
When this is compiled, the code excluding the template part is
int plus_0 = 1 + 2;
int plus_1 = 3 + 4 + 5;
printf("%d %d\n", plus_0, plus_1);
Implement so that Save the result of the calculation one by one and pass the saved result to print
.
By the way, I implemented +
, and I implemented it just by evaluating all the arguments and concatenating with +
.
This will result in a large number of variables being declared in the main function, leaving memory reserved. Originally it would be a place to implement GC, but I will leave this as a development issue as well.
Now that the implementation of print
is complete, let's move on to the implementation of progn
.
However, since it only evaluates the arguments in order, it is not particularly difficult, and the final evaluation value is the return value.
Now that we can run multiple processes, let's define define
and handle variables.
(progn
(define a 1)
(print a))
int a = 1;
printf("%d\n", a);
Currently, we only handle integers, so there is no problem as long as you define integer variables. Behind the scenes, record the variable declaration check so that it can be checked by the internal processing of the compiler.
Now that we can define variables, let's move on to implementing lambda
so that we can also define functions.
First, let's implement immediate execution of lambda
before considering the combination with define.
(print ((lambda (x) (+ x 1)) 2))
int lambda_0(int *args) {
int x = args[0];
int add_0 = x + 1;
return add_0;
}
int main(void) {
int lambda_0_args[] = {2};
printf("%d\n", lambda_0(lambda_0_args));
}
If you use lambda
, we will define it as a sequential function. I don't want to consider the number of arguments, so
We decide to receive it as an array and reassign it to variables as appropriate within the function. Where you are calling a function
Make sure to store the arguments in an array in advance.
Now that we have lambda
, let's implement the function definition in combination with define
.
However, just assigning the function defined in lambda
to the function pointer makes it look like that.
(define add1 (lambda (x) (+ x 1)))
int (*add1)(int*);
int lambda_0(int *args) {
int x = args[0];
int add_0 = x + 1;
return add_0;
}
int main(void) {
add1 = lambda_0;
}
Next, I want to enable conditional branching, so I will implement ʻeq and ʻif
. What you do with ʻeqis similar to
+, just concatenate with
&&instead of
+`.
(eq 1 2 3)
bool eq_0 = 1==2 && 1==3;
We will use this ʻeq to start implementing ʻif
.
(if (eq 1 2) 10 20)
int if_0;
int eq_0 = 1==2;
if (eq_0) {
if_0 = 10;
} else {
if_0 = 20;
}
Even if you convert it to C ʻif, it will work like that. You can recursively calculate the Fibonacci number by adding
- as well as
+ `here.
When I actually try it,
(progn
(define fib (lambda (n)
(if (eq n 0)
0
(if (eq n 1)
1
(+ (fib (- n 1)) (fib (- n 2)))))))
(print (fib 10)))
#include<stdio.h>
int (*fib)(int*);
int lambda_1(int *args){
int n = args[0];
int if_1;
int eq_2 = n == 0;
if (eq_2) {
if_1 = 0;
} else {
int if_3;
int eq_4 = n == 1;
if (eq_4) {
if_3 = 1;
} else {
int minus_5 = n;
minus_5 -= 1;
int fib_args_6[] = {
minus_5
};
int minus_7 = n;
minus_7 -= 2;
int fib_args_8[] = {
minus_7
};
int plus_9 = 0;
plus_9 += fib(fib_args_6);
plus_9 += fib(fib_args_8);
if_3 = plus_9;
}
if_1 = if_3;
}
return if_1;
}
int main(void) {
fib = lambda_1;
int fib_args_2[] = {
10
};
printf("%d\n" , fib(fib_args_2));
}
Redundant code like this is generated, but when you actually run it, it will calculate properly.
I made a compiler (transpiler) that converts lisp to C in a straightforward manner.
What was implemented in the process so far
https://github.com/kawakami-o3/PLC/tree/7d825a5ccbab45919c701ebb66cd8d2409c9f45d
You can get it from.
Anyway, aiming to move, I designed and implemented it so that the mounting difficulty would be as low as possible. I was able to implement it to the point where I could calculate the Fibonacci number with almost no stumbling blocks.
To summarize the few stumbling points,
if
--In the initial implementation, if was implemented as a function, and all the arguments were evaluated before the return value was decided.
--Recursive function
--I was checking ʻundefined, but I also experienced the promise that the assignment timing of
defineand the definition timing of
lambda` will be mixed up and an error will be thrown.This time I implemented Lisp that handles only integers, and even though it was Lisp, I couldn't do list processing properly.
Next time, we will review the data structure so that list processing can be performed, compile the lisp code of ʻeval, and implement a compiler that can run ʻeval
at runtime.
Thank you for reading.
Here, we will touch on the remaining ʻatom,
quote,
car,
cdr,
cons`.
The reason why I did not deal with it in the main part is that the return value is ʻintin the
lambdaimplemented in this article. This is because there was a fatal design mistake that it could not be used with
cdr or
cons`.
However, it's also unpleasant to move to the next goal without doing it at all, so I implemented it softly as mentioned here.
For ʻatom, I tried to evaluate the argument at compile time and initialized it with
0or
1` depending on whether it is an integer or not.
In quote
, I implemented it by limiting the arguments to integers and integer lists only.
If it is an integer
(quote 1)
int quote_0 = 1;
If it is an integer list, it will be converted to an integer array.
(quote (1 2 3))
int quote_0[] = {1,2,3};
For car
, we implemented it internally to return the first value of the integer array, just as we converted the integer list to an integer array with quote
.
cdr
implements to put the second and subsequent elements in the newly defined array for the integer list passed as an argument.
For cons
, I also created a new array to insert elements.
I decided to work on nested lists etc. in the next stage, and here I will deal only with one-dimensional integer lists.
The above is the implementation with the compiler covered in this article.
Recommended Posts