Backtrace using DWARF information and pyelftools

TL;DR

python extension of gdb and pyelftools module I used it to [implement] the backtrace (https://gist.github.com/sumomoneko/d8415b14f8eddf74a3eb9bd6d521fab3). It may be useful in environments where the gdb backtrace command does not work as is.

Motivation

When developing at Kumiko Michiho,

--The gdbserver that came with ICE is somehow buggy, and gdb crashes with backtrace. --Since gdb uses a thread implementation that it doesn't know, backtrace for each thread is not possible. --Gdbserver, which has only binary, knows the thread structure, but if you change the code around the thread, the structure will shift and the gdb will drop. --I got a raw memory dump, but gdb for this platform doesn't have the ability to read dump files in the first place.

Isn't that the situation that is rarely common? When I was in trouble because I was thrown into the savanna with just such serials and LEDs, thanks to How to walk debug information, I became a friend who can read Now that I'm used to it, I'd like to backtrace it.

Preparation

Confirmation environment

For the sake of simplicity of the follow-up test, I have confirmed it in the x64 ubuntu 16.04 gcc5.4 / python3.5 environment instead of the embedded environment [^ 1].

[^ 1]: As a built-in, we are putting it into actual battle with RX.

In the explanation

The basics of DWARF are explained in great detail in How to walk debug information, so please refer to that. Also, [here](https://ja.osdn.net/projects/drdeamon64/wiki/DWARF%E3%83%95%E3%82%A1%E3%82%A4%E3%83%AB%E3% The article 83% 95% E3% 82% A9% E3% 83% BC% E3% 83% 9E% E3% 83% 83% E3% 83% 88) was also very helpful.

Also, I won't go into too much detail about gdb Python-API. I'm not doing complicated things with the python gdb module, so I think you can get it by looking at the sample.

Install pyelftools

Decoding DWARF with bare hands is difficult, so install the python module pyelftools that decodes a considerable part. I want to isolate the environment with virtualenv etc., but when I call python via gdb, it doesn't look at the module path in virtualenv, so I put it directly in the system.

$ sudo pip3 install pyelftools

Alternatively, expand the module somewhere and put it at the beginning of your python script

import sys
sys.path.append("/home/oreore/virtualenv/python/site-packages")

You may enter the path like this [^ 2].

[^ 2]: Please point out if there is a brilliant solution that can persuade gdb.

Clone & make the verification code

Use affordable code to see how it works. Here, we will use the Mach compression / decompression library lz4 and its test application simpleBuffer. For example, here explains what lz4 is, so please refer to it.

Now, clone and build as follows.

$ git clone https://github.com/lz4/lz4.git
$ cd lz4
$ CFLAGS="-g3 -fno-dwarf2-cfi-asm" make all

As a little supplement to the build flag, in gcc (Ubuntu 5.4.1-2ubuntu1 ~ 16.04) 5.4.1 20160904 I tried here, gas CFI directive /as/CFI-directives.html) seems to omit the section .debug_frame where the stack frame information is stored, and only its simplified version (?) .Eh_frame came out. pyelftools also supports decoding .eh_frame, but it takes some time to use. So this time, I'm asking the compiler to output .debug_frame -fno-dwarf2-cfi-asm options I requested it with .html). This may have changed with recent ubuntu / gcc.

Run backterace with gdb, see the answer

First, run it with gdb, break at the appropriate part, and backtrace.

$ cd examples
$ gdb simpleBuffer
 :
 :
(gdb) b lz4.c:578
Breakpoint 1 at 0x4010a4: lz4.c:578. (11 locations)

(gdb) r
Starting program: /home/oreore/lz4/examples/simpleBuffer 

Breakpoint 1, LZ4_compress_generic (acceleration=1, dictIssue=<optimized out>, dict=<optimized out>, tableType=<optimized out>, 
    outputLimited=<optimized out>, maxOutputSize=<optimized out>, inputSize=57, dest=0x61e010 "", 
    source=0x41b4b8 "Lorem ipsum dolor sit amet, consectetur adipiscing elit.", cctx=0x7fffffff9810) at lz4.c:578
578	    LZ4_putPosition(ip, cctx->hashTable, tableType, base);

(gdb) bt
#0  LZ4_compress_generic (acceleration=1, dictIssue=<optimized out>, dict=<optimized out>, 
    tableType=<optimized out>, outputLimited=<optimized out>, maxOutputSize=<optimized out>, 
    inputSize=57, dest=0x61e010 "", 
    source=0x41b4b8 "Lorem ipsum dolor sit amet, consectetur adipiscing elit.", cctx=0x7fffffff9570)
    at lz4.c:578
#1  LZ4_compress_fast_extState (state=0x7fffffff9570, 
    sourpppce=0x41b4b8 "Lorem ipsum dolor sit amet, consectetur adipiscing elit.", dest=0x61e010 "", 
    inputSize=57, maxOutputSize=73, acceleration=1) at lz4.c:739
#2  0x0000000000404722 in LZ4_compress_fast (
    source=0x41b4b8 "Lorem ipsum dolor sit amet, consectetur adipiscing elit.", dest=0x61e010 "", 
    inputSize=57, maxOutputSize=73, acceleration=1) at lz4.c:760
#3  0x0000000000404776 in LZ4_compress_default (
    source=0x41b4b8 "Lorem ipsum dolor sit amet, consectetur adipiscing elit.", dest=0x61e010 "", 
    inputSize=57, maxOutputSize=73) at lz4.c:771
#4  0x0000000000400957 in main () at simple_buffer.c:54

Let's reproduce this.

Implementation

I have prepared the completed one in gist, so I would like to explain it accordingly.

Keep platform-specific parts separate

I'd like to start analyzing DWARF right away, but before that, I'll organize the execution environment.

This time, we will create a python script that runs on x64 gdb, but it is not essential to run it on gdb separately. Also, due to the nature of DWARF, it does not mean that it must be x64. If the following functions are prepared, it can be executed as a single python script independently of gdb and architecture.

--Get the file path that contains DWARF --Read memory from the specified address --Get the program counter of the backtrace start point (program stop location) --Get the register value at the time of stop --Set a value for the stack pointer

First, let's separate these functions as functions so that we can easily change the target later. The implementation of each function is explained below.

Get the file path that contains DWARF

Get the path of the executable that is not stripped. It takes the name of the object running under gdb (called "inferior" in gdb). If it is built-in, it will be a binary file before strip that was saved before burning to ROM.

def get_elf_file_name() -> str:
    """
Get the filename that contains the debug information.
Under gdb environment, gdb.objfiles()[0].Found in filename
    :return:File path with debug information
    """
    import gdb
    return gdb.objfiles()[0].filename

Read memory from the specified address

uintptr_t ret = *reinterpret_cast<uintptr_t*>(addr);

Here is an image of doing this with gdb-python. It is used to fetch the memory pointed to by the register. Since the stack area is required as the memory area, it is good to at least set the stack area as the dump target when creating your own dump.

def read_uintptr_t(addr: int) -> int:
    """
    uintptr_Read t size data from memory
    uintptr_t ret = *reinterpret_cast<uintptr_t*>(addr);
    :param addr:address
    :return:data
    """
    import gdb
    uintptr_t = gdb.lookup_type('uintptr_t')
    return int(gdb.Value(addr).reinterpret_cast(uintptr_t.pointer()).dereference())

Get the program counter of the backtrace start point (program stop location)

I just wrote (gdb) p $ pc in python-gdb. If you make your own dump, is it like saving the source address that jumped to exception handling?

def get_pc() -> int:
    """
Get the program counter
    :return:PC value
    """
    import gdb
    return int(gdb.parse_and_eval("$pc"))

Get the register value at the time of stop

DWARF is designed for a variety of architectures. Therefore, registers are not treated with architecture-specific names, but are numbered and managed. Then, what register is what number? Quickly [source code of gdb](https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gdb Let's take a look at /amd64-tdep.c; h = b589d93940f1f498177ba91273190dc9b0714370; hb = HEAD # l156).

So here is the finished product. I've commented out the stN and mmN registers, but this is a slack because the wide registers were a hassle [^ 3]. If you're looking for a register number for a different architecture, you should look around for a similar file with a CPU name in a similar place in the gdb source tree.

[^ 3]: I originally wrote it for debugging embedded CPUs, so I don't have such registers.

def get_registers() -> List[int]:
    """
Collect register values from gdb according to the DWARF register number
    https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gdb/amd64-tdep.c;h=b589d93940f1f498177ba91273190dc9b0714370;hb=HEAD#l156
    :return:Register value list in order of DWARF register number
    """
    import gdb
    reg_names = ["rax", "rdx", "rcx", "rbx", "rsi", "rdi", "rbp", "rsp",
                 "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15",
                 "rip",
                 # "xmm0", "xmm1", "xmm2", "xmm3", "xmm4", "xmm5", "xmm6", "xmm7",
                 # "xmm8", "xmm9", "xmm10", "xmm11", "xmm12", "xmm13", "xmm14", "xmm15",
                 None, None, None, None, None, None, None, None,
                 None, None, None, None, None, None, None, None,
                 # "st0", "st1", "st3", "st4", "st5", "st6", "st7",
                 None, None, None, None, None, None, None, None,
                 # "mm0", "mm1", "mm2", "mm3", "mm4", "mm5", "mm6", "mm7",
                 None, None, None, None, None, None, None, None,
                 "eflags",
                 "es", "cs", "ss", "ds", "fs", "gs", None, None,

                 None, None, None, None,
                 None, None,
                 "mxcsr", "fctrl", "fstat"]

    ret = []
    for name in reg_names:
        if name is not None:
            val = int(gdb.parse_and_eval("${}".format(name)))
            ret.append(val)
        else:
            ret.append(-1)
    return ret

Set a value for the stack pointer

As you traverse the call tree, you need to rewind the stack. Since the register that holds the stack is architecture-dependent, it is separated into functions as rewind processing.

def unwind_stack(regs: List[int], cfa: int) -> None:
    """
Set the address in the register that holds the stack pointer.
For x64$For rsp. The DWARF register number is 7, so set the address there.
    :param regs:Array of registers(DWARF register number order)
    :param cfa:Address to set as a stack pointer
    :return: None
    """
    regs[7] = cfa

That's it for gdb and architecture reliance.

Overall flow

The overall flow of the backtrace process is as follows.

  1. Get the stop address and register (including stack pointer)
  2. Get function information from address
  3. Obtain the caller address and the register status immediately before the call from the stack frame information.
  4. Go back to 2 and repeat

This is written solidly with main ().

def main() -> None:
    """
Show backtrace
    """
    with open(get_elf_file_name(), 'rb') as f:
        elf_file = ELFFile(f)
        if not elf_file.has_dwarf_info():
            print('file has no DWARF info')
            return

        dwarf_info = elf_file.get_dwarf_info()

        #For the first time from the current stop position information,
        pc = get_pc()
        regs = get_registers()
        i = 0
        while True:
            #Get function information of stop position
            fi = get_func_info(dwarf_info, pc)
            if fi is None:
                break

            print("#{:<3}0x{:016x} in {}() at {}:{}".format(i,
                                                            fi["addr"],
                                                            fi["func_name"],
                                                            fi["path"],
                                                            fi["line"]))
            i += 1

            #Look at the stack frame and follow the caller
            pc, regs = get_prev_frame(fi["cu"], pc, regs, read_uintptr_t)
            if pc is None:
                break

"Getting the stop address, register (& stack pointer)" only calls the function explained in the previous section, so we will start with "Getting the function information including the address".

Get function information including address

As a DWARF capture route to obtain a function that includes it from a certain address,

--Lick all CU / DIE and find a hit address range --Subtract DIE from the address

There are two ways. The latter is exactly the method for this purpose, [DWARFv4 6.1.2 Lookup by Address](http://www.dwarfstd.org/doc/DWARF4.pdf#%5B%7B%22num%22%3A347%2C % 22gen% 22% 3A0% 7D% 2C% 7B% 22name% 22% 3A% 22XYZ% 22% 7D% 2C0% 2C792% 2Cnull% 5D) Corresponding table to .debug_aranges section Is written. Unfortunately, pyelftools doesn't support .debug_aranges. So, I will frankly search for CU / DIE.

def get_func_info(dwarf_info: DWARFInfo, addr: int) -> Optional[Dict]:
    """
Get the function information including the address indicated by addr
    :param dwarf_info:DWARF information
    :param addr:Program counter
    :return:Function information such as function name and address
    """

    #From each compile unit
    for cu in dwarf_info.iter_CUs():
        #While iterating the DIE
        for die in cu.iter_DIEs():
            try:
                #Looking for the DIE of the function,
                if die.tag == 'DW_TAG_subprogram':
                    #Find the address range occupied by the function
                    low_pc = die.attributes['DW_AT_low_pc'].value
                    high_pc_attr = die.attributes['DW_AT_high_pc']
                    high_pc_attr_class = describe_form_class(high_pc_attr.form)
                    if high_pc_attr_class == 'address':
                        high_pc = high_pc_attr.value
                    elif high_pc_attr_class == 'constant':
                        high_pc = low_pc + high_pc_attr.value
                    else:
                        print('Error: invalid DW_AT_high_pc class:{}\n'.format(high_pc_attr_class))
                        continue

                    #Bingo if the specified address matches this function range
                    if low_pc <= addr < high_pc:
                        ret = dict()
                        ret["addr"] = addr
                        ret["cu"] = cu
                        ret["func_name"] = die.attributes['DW_AT_name'].value.decode("utf-8")
                        ret["func_addr"] = low_pc
                        ret["offset_from_func"] = addr - low_pc
                        ret.update(get_file_line_from_address(cu, addr))
                        return ret
            except KeyError:
                continue
    return None

From all CompileUnits, search for DIE and iterate DIE DW_TAG_subprogram which represents the function. The address range occupied by the function is [2.17 Code Addresses and Ranges](http://dwarfstd.org/doc/DWARF4.pdf#%5B%7B%22num%22%3A137%2C%22gen%22%3A0% The specifications are written in 7D% 2C% 7B% 22name% 22% 3A% 22XYZ% 22% 7D% 2C0% 2C792% 2Cnull% 5D). If the attribute [DW_AT_low_pc, DW_AT_high_pc) indicating the interval indicates a single address range, or if there are multiple discontinuous ranges due to optimization etc., if the range is specified by DW_AT_ranges There is. For now, let's deal with only a single address range. If DW_AT_ranges appears, implement it at that time.

From here, domain-specific words such as attributes and classes will be scattered, so I will organize them once.

CompileUnit(CU)
The unit to compile. For C, source file unit.
DIE
A structure of debug information, which is a tree of parent-child relationships. There is one DIE tree for each CU. There are various types, but they are roughly distinguished by DW_AT_TAG_ * .
attribute ( DW_AT_ * )
Information element contained in DIE. Names and address ranges.
class The meaning expressed by the
attribute. address , constant , string .
format ( DW_FORM_ * )
Shows how it is held as an entity. DW_FORM_data2 is a 2-byte value, DW_FORM_data4 is a 4-byte value.

If you use these terms to describe how to find the start address of a function,

For the CU in main.c, if you follow from the top DIE, you will reach the main function DIE. The tag for this DIE is DW_TAG_subprogram and the DW_AT_name attribute is "main". The DW_AT_low_pc attribute has a value of the ʻaddress class in the form DW_FORM_addr`. That is the start address of the main function.

It will be.

Now that we know the start address, let's look at the end side, the DW_AT_high_pc attribute. [2.17.2 Contiguous Address Range](http://dwarfstd.org/doc/DWARF4.pdf#%5B%7B%22num%22%3A137%2C%22gen%22%3A0%7D%2C%7B%22name% According to 22% 3A% 22XYZ% 22% 7D% 2C0% 2C792% 2Cnull% 5D), the DW_AT_high_pc attribute can be of class ʻaddress or constant. If it's class ʻaddress, it's * the relocated address *, so you can think of it as an address in loaded memory [^ 4]. If the class is constant, it means the offset address from DW_AT_low_pc.

[^ 4]: I think ASLR is disabled when gdb works, so it should be the same as the asking price at the time of linking (I'm not confident because I don't think about shared libraries properly)

Now that you know the start and end addresses of this function, you can determine if this is the function you are looking for.

Find the source file name / line number from the address

Now that I know the function name and address, what should I do with the source file?

[6.2 Line Number Information](http://www.dwarfstd.org/doc/DWARF4.pdf#%5B%7B%22num%22%3A350%2C%22gen%22%3A0%7D%2C%7B%22name% According to 22% 3A% 22XYZ% 22% 7D% 2C0% 2C792% 2Cnull% 5D), you can see that there is an address-to-source code conversion table in the .debug_line section. The basic idea is to have a reference table from the object's address to the source code file name, line, and column number. However, if you simply create such a table, it will be a huge table that is several times larger than the object as shown below.

address Source file name Number of lines Number of digits
0x00abcd main.c 10 8
0x00abce main.c 11 10
0x00abcf main.c 11 10
0x00abcg main.c 12 8
0x00abch main.c 12 8
0x00abci main.c 13 8

Therefore, DWARF uses the following two methods to reduce the storage size.

  • Simply omit duplicate lines
address Source file name Number of lines Number of digits
0x00abcd main.c 10 8
0x00abce main.c 11 10
0x00abcg main.c 12 8
0x00abci main.c 13 8

Saves size by stripping duplicate lines when machine instructions are multiple bytes.

  • Design a stack machine

Design your own stack machine without recording tables and use it to save recording size.

_People, people, people, people, people, people, people, people, people, people
> Design a stack machine <
 ̄Y^Y^Y^Y^Y^Y^Y^Y^Y^Y^Y ̄

It suddenly became a super expansion, but fortunately pyelftools moves the stack machine and expands it to the address / row number table, so I am grateful to use it. Actually, class Line Program The stack machine is running around here.

So, the function to get the source file name and the number of lines from the address is as follows.

def get_file_line_from_address(cu: CompileUnit, addr: int) -> Dict:
    """
Find the source code file name and line count information from the compile unit
    :param cu:compile unit information
    :param addr:Address in the object
    :return:Source code file name and number of lines
    """

    top_die = cu.get_top_DIE()  # type: DIE

    #The compile-time directory.
    #The source path is shown relative to this directory
    if "DW_AT_comp_dir" in top_die.attributes.keys():
        comp_dir = top_die.attributes["DW_AT_comp_dir"].value.decode('utf-8')
    else:
        comp_dir = ""

    line_program = cu.dwarfinfo.line_program_for_CU(cu)  # type: LineProgram

    for entry in reversed(line_program.get_entries()):  # type: LineProgramEntry
        if entry.state is not None and not entry.state.end_sequence:
            if entry.state.address < addr:
                #The address range of entry contained the address you were looking for

                #Ask for the full path of the file
                fe = line_program["file_entry"][entry.state.file - 1]
                name = fe.name.decode('utf-8')
                if fe.dir_index != 0:
                    #A different directory than the compile-time directory(Notated as a relative path)If there is a source in
                    d = line_program["include_directory"][fe.dir_index - 1].decode('utf-8')
                else:
                    d = ""

                path = posixpath.normpath(posixpath.join(comp_dir, d, name))

                ret = dict()
                ret["path"] = path
                ret["line"] = entry.state.line
                return ret

    return dict()

Obtain the caller address and the register status immediately before the call from the stack frame information.

Well, here is the production. To go back to the call

  • Caller's address
  • Caller's stack pointer
  • Register storage location that is promised to be returned on return according to the calling convention

I need this information.

The basic idea is, as usual, to have a huge table. For example

address Caller's stack pointer Register r0 to return Register r1 to return 呼び元address
0x100 .. .. .. ..
0x101 .. .. .. ..
0x102 .. .. .. ..
0x103 .. .. .. ..

By creating such a huge table, the information for returning to the caller can be restored at any address while the consumption status of the register stack changes every moment as the function progresses.

Here is a little definition of the term, but roughly define that the top of the stack pointer when returning to the function caller is called CFA. To be exact: 6.4 Call Frame Information

An area of memory that is allocated on a stack called a “call frame.” The call frame is identified by an address on the stack. We refer to this address as the Canonical Frame Address or CFA. Typically, the CFA is defined to be the value of the stack pointer at the call site in the previous frame (which may be different from its value on entry to the current frame).

The memory area allocated on the stack is called "call frame". The call frame is identified as an address on the stack. The address for this identification is called CFA (reference frame address). Typically, the CFA will be the value of the stack pointer just before the function call.

So, * typically * is a point, but it's a rough understanding.

Well, as usual, DWARF compresses this table using a stack machine, but the actual work is done by pyelftools. I can't see the end when I write down the details about the expansion method, so I will explain only the atmosphere of chasing the expanded table.

First, I tried to display the information of the LZ4_compress_fast () function recorded in the .debug_frame section with ʻeu-readelf -w. The part with Program` is the code of the stack machine.

[    70] CIE length=20
   CIE_id:                   0
   version:                  3
   augmentation:             "zR"
   code_alignment_factor:    1
   data_alignment_factor:    -8
   return_address_register:  16
   Augmentation data:        0x3 (FDE address encoding udata4)

   Program:
     def_cfa r7 (rsp) at offset 8
     offset r16 (rip) at cfa-8
     nop
     nop
 :
 :
[   470] FDE length=44 cie=[    70]
   CIE_pointer:              1028
   initial_location:         0x00000000004046a3 <LZ4_compress_fast>
   address_range:            0xa1

   Program:
     advance_loc4 1 to 0x1
     def_cfa_offset 16
     offset r6 (rbp) at cfa-16
     advance_loc4 3 to 0x4
     def_cfa_register r6 (rbp)
     advance_loc4 156 to 0xa0
     def_cfa r7 (rsp) at offset 8
     nop
     nop
     nop
     nop
     nop
     nop
     nop

The address register correspondence table will be restored from here, but when decoded with pyrlftools, it will be expanded to the following form.

# entries = cu.dwarfinfo.CFI_entries()
# entry.cie.header
Container({'version': 3,
           'code_alignment_factor': 1,
           'augmentation': b'',
           'length': 20,
           'data_alignment_factor': -8,
           'CIE_id': 4294967295,
           'return_address_register': 16})
# entry.decoded()
DecodedCallFrameTable(table=[{'pc':  0x4046A3,
                              'cfa': CFARule(reg=7, offset=8, expr=None)},
                              16:    RegisterRule(OFFSET, -8),
                             {'pc':  0x4046A4,
                              'cfa': CFARule(reg=7, offset=16, expr=None),
                              6:     RegisterRule(OFFSET, -16),
                              16:    RegisterRule(OFFSET, -8)},
                             {'pc':  0x4046A7,
                              'cfa': CFARule(reg=6, offset=16, expr=None),
                              6:     RegisterRule(OFFSET, -16)
                              16:    RegisterRule(OFFSET, -8),},
                             {'pc':  0x404743,
                              'cfa': CFARule(reg=7, offset=8, expr=None),
                              6:     RegisterRule(OFFSET, -16),
                              16:    RegisterRule(OFFSET, -8)}],
                      reg_order=[16, 6])

The table below summarizes this in the form of a table.

By the way, here, 16th, $ rip is assigned as the register return_address_register to store the return destination, but it may not be assigned to the actual register depending on the architecture.

address CFA r6 ($rbp) r16 ($rip, return_address_register)
0x4046A3 r7+8 - *(CFA-8)
0x4046A4 r7+16 *(CFA-16) *(CFA-8)
0x4046A7 r6+16 *(CFA-16) *(CFA-8)
0x404743 r7+8 *(CFA-16) *(CFA-8)

Let's actually use this table to restore the address of the caller who called LZ4_compress_fast ().

(gdb) bt
#0  LZ4_compress_generic (acceleration=1, dictIssue=<optimized out>, dict=<optimized out>, tableType=<optimized out>, 
    outputLimited=<optimized out>, maxOutputSize=<optimized out>, inputSize=57, dest=0x61e010 "", 
    source=0x41b4b8 "Lorem ipsum dolor sit amet, consectetur adipiscing elit.", cctx=0x7fffffff9530) at lz4.c:578
#1  LZ4_compress_fast_extState (state=0x7fffffff9530, source=0x41b4b8 "Lorem ipsum dolor sit amet, consectetur adipiscing elit.", dest=0x61e010 "", 
    inputSize=57, maxOutputSize=73, acceleration=1) at lz4.c:739
#2  0x0000000000404722 in LZ4_compress_fast (source=0x41b4b8 "Lorem ipsum dolor sit amet, consectetur adipiscing elit.", dest=0x61e010 "", 
    inputSize=57, maxOutputSize=73, acceleration=1) at lz4.c:760
#3  0x0000000000404776 in LZ4_compress_default (source=0x41b4b8 "Lorem ipsum dolor sit amet, consectetur adipiscing elit.", dest=0x61e010 "", 
    inputSize=57, maxOutputSize=73) at lz4.c:771
#4  0x0000000000400957 in main () at simple_buffer.c:54

First, move to the stack frame where the LZ4_compress_fast () function is executing.

(gdb) frame 2
#2  0x0000000000404722 in LZ4_compress_fast (source=0x41b4b8 "Lorem ipsum dolor sit amet, consectetur adipiscing elit.", dest=0x61e010 "", 
    inputSize=57, maxOutputSize=73, acceleration=1) at lz4.c:760
760	    int const result = LZ4_compress_fast_extState(ctxPtr, source, dest, inputSize, maxOutputSize, acceleration);

The execution address at this time is 0x404722, so if you search the above table by address,

address CFA r6 ($rbp) r16 ($rip, return_address)
0x4046A7 r6+16 *(CFA-16) *(CFA-8)

This line matches. The location of the CFA is $ rbp + 16 because r6 == $ rbp. And since r16 is CFA-8, ** return_address ** is:

(gdb) x $rbp+16-8
0x7fffffffd568:	0x00404776

Should be. Since this address certainly matches the execution address of frame # 3, it means that the caller address can be restored.

There are some caveats here. I'm referring to r6 as the location of the CFA, which is now r6 at the time of 0x404722 execution, not the restored r6 for call recall. If not, it will circulate because it is based on CFA to restore r6.

In other words:

  1. Find the location of the CFA using the current register value.
  2. Find the value of the register to be returned from the current function when returning, using the CFA location found above.

It is necessary to calculate in the order.

Based on the above, the following functions are processing. In addition to restoring the return_address and register, the stack is also rewound here.

def get_prev_frame(cu: CompileUnit,
                   addr: int,
                   regs: List[int],
                   read_mem: Callable[[int], int]) -> Tuple[Optional[int], Optional[List[int]]]:
    """
Execution address register and memory(Mainly stack)From, identify the stack frame,
Restore the caller's address and the register at that time
    :param cu:CU information
    :param addr:Execution address
    :param regs:List of registers indexed by DWARF Register Number
    :param read_mem:A function that reads memory. Read is a pointer(register)size
    :return:Caller address and register
    """

    if cu.dwarfinfo.has_CFI():
        entries = cu.dwarfinfo.CFI_entries()
    else:
        # .debug_There is no frame
        entries = []

    for entry in entries:
        if "initial_location" not in entry.header.keys():
            continue
        start = entry.header.initial_location
        end = start + entry.header.address_range
        if not (start <= addr < end):
            continue
        dec = entry.get_decoded()
        for row in reversed(dec.table):
            if row["pc"] <= addr:
                #Restore the return address and its registers
                cfa_rule = row["cfa"]  # type: CFARule
                assert cfa_rule.expr is None, "DWARF expression not supported"
                cfa = regs[cfa_rule.reg] + cfa_rule.offset

                return_address_rule = row[entry.cie.header.return_address_register]  # type: RegisterRule
                assert return_address_rule.type == RegisterRule.OFFSET, "Not supported except OFFSET"
                return_address = cfa + return_address_rule.arg

                prev_regs = regs[:]
                for key, reg in row.items():
                    if isinstance(reg, RegisterRule):
                        assert reg.type == RegisterRule.OFFSET, "Not supported except OFFSET"
                        prev_regs[key] = read_mem(cfa + reg.arg)

                #Stack rewind
                unwind_stack(prev_regs, cfa)
                return read_mem(return_address), prev_regs
    return None, None

Operation check

Now that it's complete, let's run it. Place bt.py in the current directory and try calling the python module from gdb. If you keep shifting the frame, the register value will change, so don't forget to return it to the first frame!

(gdb) frame 0
(gdb) source bt.py 
#0  0x00000000004010a4 in LZ4_compress_fast_extState() at /home/oreore/lz4/lib/lz4.c:575
#1  0x0000000000404722 in LZ4_compress_fast() at /home/oreore/lz4/lib/lz4.c:760
#2  0x0000000000404776 in LZ4_compress_default() at /home/oreore/lz4/lib/lz4.c:771
#3  0x0000000000400957 in main() at /home/oreore/lz4/examples/simple_buffer.c:54

It looks pretty good, but if you look closely, you'll notice that one frame is missing. This is because the inline expansion has not been parsed properly. The LZ4_compress_generic () function is inlined in the LZ4_compress_fast_extState () function, but I'm decoding it as the LZ4_compress_fast_extState () function without noticing it.

In order to interpret inline, you have to take DW_TAG_inlined_subroutine seriously, but there is not enough space to write it, so I will stop here.

Also, the parameters are not displayed. Again, this is a lot of work to do, such as looking at the .debug_loc section, to write it (ry)

in conclusion

Kemono Friends, I wanted to see the continuation. .. ..