In this post, i describe how I implemented stack tracing in my hobby operating system kernel — including symbolic address resolution without GDB. Includes code snippets, explanation of pitfalls with __builtin_return_address, and a simple symbol map generator.

What even is stack, stack tracing and symbols?

Stack

In many programming languages (including C, the one i used), stack is used to store function return addresses, local variables and context used when calling functions. It works based on LIFO (last-in, first-out) principle, when last called function is the first one to be thrown away.

In the context of operating system (hobby kernels especially), you are the one fully responsible for this stack and you need to implement one at your own. When something fails, the stack is often a thing which tells you where that happened.

Stack trace

Stack trace (or call trace, backtrace) is a list of symbols and memory addresses. These addresses tells the developer, how the program (kernel) ended up on the location where it crashed.

Typically stack trace is shown while crash, kernel panic - it might be on the screen, in the runtime logs, debug outputs, etc.

DEBUG: [ FATAL   ][Kernel panic] Kernel panic.
...
DEBUG: [ FATAL   ][Kernel panic] kernel stack trace
  [trace frame 0x0] 0x1078457: debug_dumpStackTrace
  [trace frame 0x1] 0x1074140: kernel_panic
  [trace frame 0x2] 0x1079080: kernel_exit
  [trace frame 0x3] 0x1061516: _start

It is often also present in GNU/Linux kernel panic info:

Symbols

In translated binary executable (for example kernel.elf - in my case, krnl.tmp.elf), a memory address is assigned for each function. The problem is, the address itself (0x1074140) does not tell you much - this is when symbols take place. They help mapping memory addresses to function names.

It is common to throw away symbols while binary optimization or stripping, but it is still possible to generate them during build process and use them later in order to translate memory addresses to function names even without the debugger. But more on that later.

How i implemented stacktrace

In early kernel, you have no GDB, no debug symbols in runtime, no backtrace or libunwind. If stacktrace is needed, you need to pull it out bare hands, using GCC builting function __builtin_return_address().

/*
#define __wrap_return_address(x)    \
    ((x) >= 0 && (x) < 16 ? __builtin_return_address(x) : NULL)

As you can see, i wrapped the function call in the wrapper, because:

i can always replace __builtin_result_address with something else, if needed (and possible),
the macro i used is designed to prevent overflows while passing the x parameter

But how do you dump it?

Reading stack trace is useless if you can't use it somehow. And the way you can use the most of it (and probably the most understandable way and only way you want to) is to dump it somehow in order to debug your kernel.

There is simple way to do it, just iterate through it until you reach the end or the maximum depth you wanted to, right?

for(int i = 0; i > depth; i++) {
    void* addr = __wrap_return_address(i);
    if(!addr) return;

    / ...
}

It makes sense, but the problem is the __builtin_return_address is actually not a function, but a macros on the compiler level (intrinsics) and the values of their call parameters are needed to be known at the time of compilation, so:

GCC can replace their result statically (this is called "constant folding")
GCC can optimize the code
GCC can eliminate branches and decide the data type

The way you can overcome the intrinsics is probably using the switch statement and checking for every number from zero to maximum. I know, it looks goofy and bad, and no one wants that in their code - because of this, i plan replacing it with Frame pointer-based unwinding sometime in the future.

void* get_return_address(int i) {
    switch(i) {
        case 0: return __builtin_return_address(0);
        case 1: return __builtin_return_address(1);
        // ...
        case 15: return __builtin_return_address(15);
        default: return NULL;
    }
}

/**
 * @brief Dumps stack trace
 * 
 * @param depth how many addresses to dump
 * @param _fn_print pointer to function that is used as print
 */
void debug_dumpStackTrace(u8 depth, void (*_fn_print)(unsigned char*)) {
    // ...

    for(int i = 0; i < depth; i++) {
        void* addr = get_return_address(i);
        if(!addr || (uintptr_t) addr < 0x10000)
            break;

        // ...
    }
}

Update your linker script

When trying to build, i have experienced an linkage error saying something like the program can't reach __builtin_return_address function, or is not executable. To fix this, i needed to add this to the end of SECTIONS list of my linker script:

.note.GNU-stack : {
        KEEP(*(.note.GNU-stack))
    }

Symbols, converting addresses to names

As mentioned, without symbols, 0x1074140 in dump does not make much sense. This is why i decided to generate custom symbol map from linker outputs and use it in runtime.

Converting the addresses

After kernel compilation, i generate symbol map using nm:

ld -T config/linker.ld -o build/bin/krnl.tmp.elf $(find build/obj -name "*.o" -type f) \
    -m elf_i386 -nostdlib

nm -n build/bin/krnl.tmp.elf > build/kernel.map

Then, i use awk, cat, and echo to convert this symbol map to an array in C.

cat build/kernel.map | awk '$2 == "T" { printf("    { 0x%s, \"%s\" },\n", $1, $3); }' > src/kernel/core/kernel.sym.entries

echo '#include "kernel/core/symbols.h"' > src/kernel/core/kernel.sym.c
echo 'symbol_t kernel_symbols[] = {' >> src/kernel/core/kernel.sym.c
cat src/kernel/core/kernel.sym.entries >> src/kernel/core/kernel.sym.c
echo '};' >> src/kernel/core/kernel.sym.c
echo 'size_t kernel_symbol_count = sizeof(kernel_symbols) / sizeof(kernel_symbols[0]);' >> src/kernel/core/kernel.sym.c

gcc -I src -m32 -std=gnu99 -ffreestanding -c src/kernel/core/kernel.sym.c -o build/obj/kernel/core/kernel.sym.c.o

This results in C code which is pasted into the kernel using the script above.

typedef struct {
    uintptr_t addr;
    const char* name;
} symbol_t;

#include "kernel/core/symbols.h"
symbol_t kernel_symbols[] = {
    { 0x00100000, "__code" },
    { 0x00100000, "_code" },
    { 0x00100000, "code" },
    { 0x00100000, "__kernel_text_section_start" },
    // ...
};

size_t kernel_symbol_count = sizeof(kernel_symbols) / sizeof(kernel_symbols[0]);

After this, i can proceed to generate the image the way i always used to.

Symbol lookup

After building a symbol map and translating it into C array, making of debug_lookup function is quite easy.

const char* debug_lookup(uintptr_t addr) {
    const char* best_match = "??";
    uintptr_t best_addr = 0x00000000;

    for(size_t i = 0; i < kernel_symbol_count; i++) {
        if(kernel_symbols[i].addr <= addr && kernel_symbols[i].addr >= best_addr) {
            best_addr = kernel_symbols[i].addr;
            best_match = kernel_symbols[i].name;
        }
    }

    return best_match;
}

This function takes an address of a function, looks up to the symbol map and returns function name, or ?? if nothing is found.

Lets make some use of it. Remember me saying that stack trace dump is useless if we don't know what the functions are, but addresses? Lets update the debug_dumpStackTrace() function to dump not only the addresses, but function names next to them.

void debug_dumpStackTrace(u8 depth, void (*_fn_print)(unsigned char*)) {
    // ...

    for(int i = 0; i < depth; i++) {
        // ...

        _fn_print((unsigned char*) "  [trace frame 0x");
        _fn_print((unsigned char*) i_string);
        _fn_print((unsigned char*) "] 0x");
        _fn_print((unsigned char*) addr_string);
        _fn_print(": ");
        _fn_print((unsigned char*) debug_lookup((uintptr_t) addr));
        _fn_print((unsigned char*) "\n");
    }
}

Enhacing the kernel panic dump with stack trace dump with function names

I already had kernel panic callback implemented, so the only thing to worry about was calling a function to dump stack trace

#ifndef __KERNEL_PANIC_STACK_TRACE_DEPTH
#define __KERNEL_PANIC_STACK_TRACE_DEPTH 10
#endif

void kernel_panic(REGISTERS* reg, signed int exception) {
    // ...

    // --- stack trace ---
    debug_message("kernel stack trace\n", "Kernel panic", KERNEL_FATAL);
    void _print(unsigned char* data) {
        puts(data);
        debug_append(data);
    }

    puts("\nKERNEL STACK TRACE:\n");
    debug_dumpStackTrace(__KERNEL_PANIC_STACK_TRACE_DEPTH, &_print);

    // ...
}

Now we have beautiful and useful info dump.

DEBUG: [ FATAL   ][Kernel panic] Kernel panic.
...
DEBUG: [ FATAL   ][Kernel panic] kernel stack trace
  [trace frame 0x0] 0x1078457: debug_dumpStackTrace
  [trace frame 0x1] 0x1074140: kernel_panic
  [trace frame 0x2] 0x1079080: kernel_exit
  [trace frame 0x3] 0x1061516: _start

Future plans

Replace __builtin_return_address() with frame pointer unwinding
Improve symbol lookup performance using binary search
Allow mapping source file + line using DWARF or ELF debug info

Václav Hajšman @vhajsman

Implementing stack traces and symbol lookup in my kernel: debugging without GDB