This article is a translation and addition of Chapter I: A Primer on Go Assembly.
This article assumes the following people:
--Understand the grammar of Go language --Understand the general stack behavior when calling a subroutine
environment
$ go version
go version go1.10 linux/amd64
The assembly output by the Go compiler is an abstraction and is not mapped to the actual hardware. The Go assembler translates this pseudo-assembly into machine language for the target hardware.
It might be helpful to imagine something like Java bytecode.
The biggest advantage of having such an intermediate layer is that it makes it easier to adapt to new architectures. For more information, see [* The Design of the Go Assembler *] by Rob Pike (https://talks.golang.org/2016/asm.slide#1).
The most important thing to know about Go assemblies is the fact that Go assemblies do not correspond directly to the target hardware. Some are directly tied to the hardware, but others are not. This eliminates the need for the compiler to require an assembler Pass in the pipeline, instead the compiler can handle pseudo-assemblies that abstract this hardware, and the Instruction selection (in this case the Go assembly to the actual assembly) The conversion to) is now partly done after code generation (the compiler's generation of the Go assembly). As an example of a pseudo-assembly, the MOV instruction of a GO assembly may be converted to a
clear
orload
instruction, or it may remain (although the name may change) depending on the architecture. While common architectural concepts such as memory data movement and subroutine calls and returns are abstracted, hardware-specific instructions are often represented as-is.
Go Assembler is a program that parses this pseudo-assembly and converts it into instructions for linker input.
Consider the following code.
//go:noinline
func add(a, b int32) (int32, bool) { return a + b, true }
func main() { add(10, 32) }
// go: noinline
Directive to prevent inlining) *Let's compile this code into an assembly.
$ GOOS=linux GOARCH=amd64 go tool compile -S direct_topfunc_call.go
0x0000 TEXT "".add(SB), NOSPLIT, $0-16
0x0000 FUNCDATA $0, gclocals·f207267fbf96a0178e8758c6e3e0ce28(SB)
0x0000 FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0000 MOVL "".b+12(SP), AX
0x0004 MOVL "".a+8(SP), CX
0x0008 ADDL CX, AX
0x000a MOVL AX, "".~r2+16(SP)
0x000e MOVB $1, "".~r3+20(SP)
0x0013 RET
0x0000 TEXT "".main(SB), $24-0
;; ...omitted stack-split prologue...
0x000f SUBQ $24, SP
0x0013 MOVQ BP, 16(SP)
0x0018 LEAQ 16(SP), BP
0x001d FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x001d FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x001d MOVQ $137438953482, AX
0x0027 MOVQ AX, (SP)
0x002b PCDATA $0, $0
0x002b CALL "".add(SB)
0x0030 MOVQ 16(SP), BP
0x0035 ADDQ $24, SP
0x0039 RET
;; ...omitted stack-split epilogue...
Dissecting add
0x0000 TEXT "".add(SB), NOSPLIT, $0-16
--0x0000
: Represented relative to the beginning of the instruction offset function.
--TEXT "". Add
: The TEXT
directive indicates that the "" .add
symbol is contained in the .text
section and that the following instructions are inside this function.
The empty string " "
is replaced with the current package name at link time. This time it will be main.add
.
--(SB)
: SB
is a register virtually defined in the Go assembly, which is a" Static-Base "pointer. It represents the beginning of the program's address space.
The " ". add (SB)
indicates that the " ". add
symbol is at a constant offset calculated by the linker from the beginning of the address space. In other words, it is a global scope function with a fixed address.
You can see this clearly with ʻobjdump`.
$ objdump -j .text -t direct_topfunc_call | grep 'main.add'
000000000044d980 g F .text 000000000000000f main.add
objdump supplement
---j .text
Text section only displayed
-- -t
Display symbol table
--000000000044d980 g F .text 000000000000000f main.add
Address 0x44d980
has a global function symbol named main.add
All user-defined symbols are described as offsets from the pseudo-registers FP (local) and SB (global). Since the pseudo-register SB can be thought of as the starting point of memory, the foo (SB) symbol can be thought of as a symbol representing the address of foo.
--NOSPLIT
: Tells the compiler not to insert the * stack-split * preamble to see if the current stack needs to be expanded.
Since the ʻaddfunction has no local variables and does not require a stack frame, it is not necessary to extend the current stack, so checking the stack extension each time the function is called is a waste of CPU resources. The compiler will automatically know this and automatically set this
NOSPLIT` flag. Stack expansion is mentioned later in the Goroutine section.
-- $ 0-16
: $ 0
represents the number of bytes in the stack frame allocated to this function, 16
represents the size of the argument (+ return value) passed by the caller. (16 bytes with int 32 x 3 + bool (align with 4 bytes))
In the general case, the size of the stack frame is followed by the size of the argument delimited by a minus sign. (This minus sign does not represent subtraction.) $ 24-8 indicates that the function has a 24-byte stack frame and is called with an 8-byte argument that exists in the calling stack frame. If NOSPLIT is not specified for TEXT, the size of the argument must be specified. For assembly functions that use the Go prototype, go vet checks to see if the argument size is correct.
0x0000 FUNCDATA $0, gclocals·f207267fbf96a0178e8758c6e3e0ce28(SB)
0x0000 FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
The FUNCDATA and PCDATA directives contain information for use by the GC.
0x0000 MOVL "".b+12(SP), AX
0x0004 MOVL "".a+8(SP), CX
Go's calling convention allows all arguments to be passed through the stack using pre-allocated space in the caller's stack frame. Therefore, it is the caller's responsibility to pass arguments to the callee and manage the stack size appropriately so that the return value of the callee is returned under the caller.
The Go compiler does not generate PUSH / POP instructions. Instead, the stack is expanded or contracted by adding or subtracting SP, which is a pseudo register that points to the top of the stack.
[UPDATE: We've discussed about this matter in issue #21: about SP register.]
The pseudo-resistor SP is used to refer to local variables and arguments. Since SP points to the beginning of the stack frame, the reference is made using a negative offset in the range [−framesize, 0). e.g. x-8 (SP), y-4 (SP)
The official documentation states that user-defined symbols are represented by offsets from the FP registers, but this is not the case for automatically generated code. Modern Go compilers always refer to arguments and local variables at an offset from the stack pointer. This allows FP to be used as an additional general purpose register on platforms with a small number of registers, such as x86. See * Stack frame layout on x86-64 * for more information. [UPDATE: We've discussed about this matter in issue #2: Frame pointer.]
" ". b + 12 (SP)
and " ".a + 8 (SP)
refer to the top 12 and 8 byte addresses on the stack, respectively. (Note that the stack extends from the upper address to the lower address)
.a
and .b
are arbitrary aliases given to the reference location. The name does not affect the processing, but it is essential for using indirect addressing on virtual registers.
The documentation for FP, which is a pseudo-frame pointer, says:
FP is a virtual frame pointer for referencing function arguments. The compiler holds the contents of this register and references the function's arguments on the stack as an offset based on this register. So for 64-bit architecture, 0 (FP) is the first argument of the function and 8 (FP) is the second argument. However, to access the arguments this way, you must start with a name, such as first_arg + 0 (FP) or second_arg + 8 (FP). (Offset from FP is different from SB, which means offset from symbol) Assembler does not accept unnamed writing such as 0 (FP) and 8 (FP) and forces this name specification To do. The actual name is irrelevant to what you are doing, but is used to document the argument name.
Finally, there are two important points.
is placed in
8 (SP)instead of
0 (SP). This is because the return address is stored in
0 (SP)when the caller is a
CALL` instruction.0x0008 ADDL CX, AX
0x000a MOVL AX, "".~r2+16(SP)
0x000e MOVB $1, "".~r3+20(SP)
ʻADDL adds two Long-words (4 byte long values) and stores the result in ʻAX
. Here, ʻAXand
CX are added and the result is stored in ʻAX
.
The result is then stored in " ". ~ R2 + 16 (SP)
on the pre-allocated stack for the caller to receive the return value. Again, " ". ~ R2
has no meaning in terms of processing content.
Since Go supports multiple return values, in this example, the constant true
is also returned as a return value.
Like the first return value, the offset is different, but the result is stored in " ". ~ R3 + 20 (SP)
.
0x0013 RET
The final pseudo-instruction RET
is to instruct the Go assembler to insert the appropriate instruction to return from the subroutine on the targeted hardware.
In most cases, it pops the return address stored in 0 (SP)
and jumps there.
The last instruction in the TEXT block must be some kind of jump instruction (usually with RET) If there is no jump instruction, the linker adds an instruction to jump to itself so that it does not execute the instruction beyond the TEXT block.
Since a lot of grammar and explanations have come out, I will write a brief summary.
;;Global function symbol"".Declare add(Main when linking.add)
;; stack-Do not insert split preamble
;;0-byte and 16-byte arguments are passed for the stack frame
;; func add(a, b int32) (int32, bool)
0x0000 TEXT "".add(SB), NOSPLIT, $0-16
;; ...omitted FUNCDATA stuff...
0x0000 MOVL "".b+12(SP), AX ;;Second argument to AX from the calling stack frame(b)Move
0x0004 MOVL "".a+8(SP), CX ;;First argument to CX from the calling stack frame(a)Move
0x0008 ADDL CX, AX ;; AX=CX+AX
0x000a MOVL AX, "".~r2+16(SP) ;;Move the addition result stored in AX to the call stack frame
0x000e MOVB $1, "".~r3+20(SP) ;;constant`true`To the calling stack frame
0x0013 RET ;; 0(SP)Jump to the return destination address stored in
The visualization of the contents of the stack when the processing of main.add
is completed is as follows.
| +-------------------------+ <-- 32(SP)
| | |
G | | |
R | | |
O | | main.main's saved |
W | | frame-pointer (BP) |
S | |-------------------------| <-- 24(SP)
| | [alignment] |
D | | "".~r3 (bool) = 1/true | <-- 21(SP)
O | |-------------------------| <-- 20(SP)
W | | |
N | | "".~r2 (int32) = 42 |
W | |-------------------------| <-- 16(SP)
A | | |
R | | "".b (int32) = 32 |
D | |-------------------------| <-- 12(SP)
S | | |
| | "".a (int32) = 10 |
| |-------------------------| <-- 8(SP)
| | |
| | |
| | |
\ | / | return address to |
\|/ | main.main + 0x30 |
- +-------------------------+ <-- 0(SP) (TOP OF STACK)
(diagram made with https://textik.com)
Dissecting main
Let's review the contents of the main
function again.
func main() { add(10, 32) }
0x0000 TEXT "".main(SB), $24-0
;; ...omitted stack-split prologue...
0x000f SUBQ $24, SP
0x0013 MOVQ BP, 16(SP)
0x0018 LEAQ 16(SP), BP
;; ...omitted FUNCDATA stuff...
0x001d MOVQ $137438953482, AX
0x0027 MOVQ AX, (SP)
;; ...omitted PCDATA stuff...
0x002b CALL "".add(SB)
0x0030 MOVQ 16(SP), BP
0x0035 ADDQ $24, SP
0x0039 RET
;; ...omitted stack-split epilogue...
0x0000 TEXT "".main(SB), $24-0
Same as for the ʻadd` function. This time, 24 bytes are secured in the stack frame so that no argument is received and no return value is returned.
0x000f SUBQ $24, SP
0x0013 MOVQ BP, 16(SP)
0x0018 LEAQ 16(SP), BP
Once again, Go's calling convention passes all function arguments through the stack.
By subtracting $ 24 bytes from SP, main
reserves 24 bytes of its own stack frame. (Note that the stack stretches downwards)
Use this reserved $ 24 bytes as follows.
--8 bytes (16 (SP)
-24 (SP)
) are used to store the current value of the frame pointer BP. This allows you to rewind the stack (follow the function under the call), which is useful when debugging. (MOVQ BP, 16 (SP)
)
--1 + 3 bytes (12 (SP)
-16 (SP)
) is reserved to receive the second return value of the ʻadd function (
bool is 1 byte but ʻamd64
) +3 bytes for architectural alignment)
--4 bytes (8 (SP)
-12 (SP)
) are reserved to receive the first return value of the ʻadd function (ʻint32
).
--4 bytes (4 (SP)
-8 (SP)
) are reserved for the value of the argument of the ʻadd function
b (int32) --4 bytes (
0 (SP)-
4 (SP) ) are reserved for the value of the argument of the ʻadd
function ʻa (int32)`
Finally, following the stack allocation, LEAQ
calculates the new address of the frame pointer and stores it in BP
. (BP = 16 (SP), similar to x86 lea instruction)
0x001d MOVQ $137438953482, AX
0x0027 MOVQ AX, (SP)
The caller puts the argument for callee at the top of the stack as an 8-byte Quad-word.
The placed values may seem meaningless at first glance, but 137438953482
is a collection of 4-byte 10
and 32
.
$ echo 'obase=2;137438953482' | bc
10000000000000000000000000000000001010
\____/\______________________________/
32 10
The upper 32-63 bits of 137438953482 represent 100000 (32)
, and the lower 0-31 bits represent 00000000000000000000000000001010 (10)
.
0x002b CALL "".add(SB)
Call the ʻaddfunction with the
CALL` instruction as a relative offset from the SB.
Note that CALL
places an 8-byte address at the top of the stack as the return destination address, so all SP
s referenced in the add
function are shifted down 8 bytes.
For example, " ". a
is represented as8 (SP)
instead of0 (SP)
.
0x0030 MOVQ 16(SP), BP
0x0035 ADDQ $24, SP
0x0039 RET
Finally,
MOVQ 16 (SP), BP
And finish the execution of the main function
I think what you are doing through ʻadd and
main` is a general subroutine call.
If you take a look at the assembly for Goroutine, you'll be familiar with the instructions for stack management.
To help you understand these patterns as soon as possible, let's understand what we are doing and why we are doing this.
Stacks
The number of Goroutines that appear in your Go program depends on the situation. Practical programs can be in the millions. Go's runtime takes a conservative approach to securing the Goroutine stack so that it doesn't run out of memory. Initially, 2KB of stack space is allocated by the runtime for any Goroutine. (The stack is actually allocated to the heap in the background)
When Goroutine runs, it may require more memory than the originally allocated 2KB. In that case, it may destroy the stack and invade other memory areas. To prevent such a stack overflow, the runtime reserves a stack that is twice as large as the previous one and copies the contents of the stack to it when Goroutine is about to exceed the stack. This process is called * stack-split * and allows you to handle Goroutine's stack size efficiently and dynamically.
Splits
For * stack-split * to work, the compiler inserts some instructions at the beginning and end of each function that may cause a stack overflow to allow you to check for a stack overflow.
As we saw earlier, this is useless for functions where stack overflow is unlikely, so NOSPLIT
can tell the compiler that it doesn't need to insert an instruction to check.
I've omitted the code for * stack-split * in the main
function above, but let's take a look at it now.
0x0000 TEXT "".main(SB), $24-0
;; stack-split prologue
0x0000 MOVQ (TLS), CX
0x0009 CMPQ SP, 16(CX)
0x000d JLS 58
0x000f SUBQ $24, SP
0x0013 MOVQ BP, 16(SP)
0x0018 LEAQ 16(SP), BP
;; ...omitted FUNCDATA stuff...
0x001d MOVQ $137438953482, AX
0x0027 MOVQ AX, (SP)
;; ...omitted PCDATA stuff...
0x002b CALL "".add(SB)
0x0030 MOVQ 16(SP), BP
0x0035 ADDQ $24, SP
0x0039 RET
;; stack-split epilogue
0x003a NOP
;; ...omitted PCDATA stuff...
0x003a CALL runtime.morestack_noctxt(SB)
0x003f JMP 0
--At the beginning of the function (prologue), Goroutine checks if the stack is exhausted, in which case it jumps to the end of the function (epilogue). --At the end of the function (epilogue), it triggers the stack expansion process, and after that, returns to the beginning of the function (prologue).
Note that this prologue and epilogue will continue to loop until the stack size is large enough.
Prologue
0x0000 MOVQ (TLS), CX ;; store current *g in CX
0x0009 CMPQ SP, 16(CX) ;; compare SP and g.stackguard0
0x000d JLS 58 ;; jumps to 0x3a if SP <= g.stackguard0
TLS
is a virtual register managed by the runtime that has a pointer to the current g
. This is a data structure that traces all the states of Goroutine.
Let's check the definition of g
from the runtime source code.
type g struct {
stack stack // 16 bytes
//stackguard0 is the stack pointer to compare with Prologue
//Normally stackgurad0 is stack.lo+Becomes a StackGuard, but can also be a StackPreempt to trigger preemption
//Preemption:The behavior of a multitasking computer system to temporarily suspend a running task
stackguard0 uintptr
stackguard1 uintptr
// ...omitted dozens of fields...
}
Since g.stack
is 16 bytes,16 (CX)
is g.stackguard0
. This is the stack threshold managed by the runtime, which can be compared to the stack pointer to see if Goroutine has used up the stack space.
The stack grows toward the lower address, so if SP <= stackguard0
, the stack space is used up. In that case, prologue jumps to epilogue.
Epilogue
0x003a NOP
0x003a CALL runtime.morestack_noctxt(SB)
0x003f JMP 0
The process of epilogue is simple: just call the runtime stack extension function to extend the stack and return to the prologue code.
The NOP
before the CALL
exists to prevent the prologue code from jumping directly to the CALL
. Depending on the platform, it may be necessary to explain a fairly deep part, so I will omit the explanation, but it is a common practice to put a NOP instruction before the CALL instruction and jump there.
[UPDATE: We've discussed about this matter in issue #4: Clarify "nop before call" paragraph.]
This time I explained only the tip of the iceberg.
The stack expansion mechanism is too detailed and complex to explain here, so I'd like to have a dedicated chapter if I get the chance.
This time, I tried to explain Go Assembly using a simple example.
We'll dig deeper into Go's internal implementation in the remaining chapters.
Recommended Posts