🧠 Debugging Hard Faults on ARM Cortex

Hard Faults are the embedded developer’s equivalent of a kernel panic—sudden, catastrophic, and often cryptic. On ARM Cortex-M microcontrollers, they signal that something went very wrong: an invalid memory access, a divide-by-zero, or a corrupted stack.

In this post, we’ll explore:

What causes Hard Faults
How to extract useful information from the fault handler
How to decode the stack frame
Tools and techniques to prevent them

💥 What Triggers a Hard Fault?

A Hard Fault is a type of exception that occurs when the processor encounters a condition it cannot handle. Common causes include:

Dereferencing a null or invalid pointer
Executing code from a non-executable region
Stack overflows or corruption
Misaligned memory access
Unhandled exceptions (e.g., BusFault, MemManage)

🧰 Anatomy of a Hard Fault Handler

When a Hard Fault occurs, the processor pushes a stack frame onto the current stack (MSP or PSP), which includes:

R0–R3, R12: General-purpose registers
LR: Link register
PC: Program counter at the time of the fault
xPSR: Program status register

You can write a custom handler to extract this data:

void HardFault_Handler(void) {
    __asm volatile (
        "TST lr, #4 \n"
        "ITE EQ \n"
        "MRSEQ r0, MSP \n"
        "MRSNE r0, PSP \n"
        "B hard_fault_handler_c \n"
    );
}

void hard_fault_handler_c(uint32_t *stacked_regs) {
    uint32_t r0  = stacked_regs[0];
    uint32_t r1  = stacked_regs[1];
    uint32_t r2  = stacked_regs[2];
    uint32_t r3  = stacked_regs[3];
    uint32_t r12 = stacked_regs[4];
    uint32_t lr  = stacked_regs[5];
    uint32_t pc  = stacked_regs[6];
    uint32_t psr = stacked_regs[7];

    printf("Hard Fault!\n");
    printf("PC = 0x%08lX\n", pc);
    printf("LR = 0x%08lX\n", lr);
    printf("xPSR = 0x%08lX\n", psr);
    // Add more logging or store to non-volatile memory
}

🔍 Decoding the Fault Address

Once you have the Program Counter (PC), you can map it back to the source code using your ELF file and addr2line:

arm-none-eabi-addr2line -e firmware.elf 0x08001234

This will tell you the exact file and line number where the fault occurred. If the PC is in a library or system call, you may need to inspect the call stack to trace back to your code.

🧪 Diagnosing Common Fault Scenarios

🧷 Null Pointer Dereference

If PC points to 0x00000000 or a low memory address, you likely dereferenced a null pointer.

🌀 Stack Overflow

If the stack pointer (SP) is outside the valid RAM range, or if the fault occurs deep in a recursive call, suspect a stack overflow. Use a memory map to verify.

🧨 Invalid Memory Access

If the fault address is in a peripheral or flash region, check for misconfigured pointers or DMA transfers.

🛡️ Preventing Hard Faults

✅ Enable All Fault Handlers

By default, some Cortex-M faults (like MemManage or BusFault) escalate to a Hard Fault. Enable them explicitly:

This gives you more granular fault information.

🧵 Use Stack Canaries

Insert known values at the end of the stack and check them periodically to detect overflows.

🧰 Static Analysis

Use tools like Cppcheck, Coverity, or Clang Static Analyzer to catch pointer misuse and undefined behavior before runtime.

🧪 Unit Tests with Fault Injection

Simulate invalid memory access or corrupted pointers in test environments to validate fault handling logic.

🧭 Final Thoughts

Hard Faults can feel like black holes in your firmware—silent until they crash everything. But with the right tools and a methodical approach, you can extract meaningful insights from even the most cryptic crashes.

Pro tip: Always log the fault context to non-volatile memory or a debug port. It’s your black box recorder when things go wrong in the field.

Kalvin McCallum @kalvin_mccallum