Running LLM llama.cpp Bare Metal on Raspberry Pi
Ricardo

Ricardo @rmaurodev

About: Software Architect | Cloud Expert | MCT

Location:
US
Joined:
Oct 21, 2019

Running LLM llama.cpp Bare Metal on Raspberry Pi

Publish Date: Jul 9
0 0

Running LLM llama.cpp Natively on Raspberry Pi

For developers and hackers who enjoy squeezing maximum potential out of compact machines, getting a large language model like llama.cpp running natively on a Raspberry Pi is a rewarding challenge. This guide walks you through compiling llama.cpp from source, downloading a model, and running inference - all on the Pi itself.

Prerequisites

Hardware

  • Raspberry Pi 4, 5, or newer
  • 64-bit Raspberry Pi OS
  • 4GB RAM minimum (8GB+ recommended)
  • Heatsink or fan recommended for cooling

Software

  • Git
  • CMake (v3.16+)
  • GCC or Clang
  • Python 3 (optional, for Python bindings)

Step-by-Step Guide

Install required tools

sudo apt update && sudo apt upgrade -y

# 👇 install dependencies and tools to build
sudo apt install -y git build-essential cmake python3-pip libcurl4-openssl-dev
Enter fullscreen mode Exit fullscreen mode

Clone and Build llama.cpp

git clone https://github.com/ggerganov/llama.cpp.git

cd llama.cpp

cmake -B build
cmake --build build --config Release -j$(nproc)
Enter fullscreen mode Exit fullscreen mode

This step takes sometime. Here we're compiling llama-cpp software.

Download a Quantized Model

mkdir -p models && cd models

wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_0.gguf

cd ..
Enter fullscreen mode Exit fullscreen mode

Let's use the model https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF for testing.

4. Run Inference

./build/bin/llama-cli \
  -m ./models/tinyllama-1.1b-chat-v1.0.Q4_0.gguf \
  -p "Hello, Raspberry Pi!"
Enter fullscreen mode Exit fullscreen mode

Optional: Python Bindings

Note: The Python bindings have been moved to a separate repository.

git clone https://github.com/abetlen/llama-cpp-python.git
cd llama-cpp-python
python3 -m pip install -r requirements.txt
python3 -m pip install .
Enter fullscreen mode Exit fullscreen mode

Use in Python:

# Use in Python:

from llama_cpp import Llama
llm = Llama(model_path="./models/tinyllama-1.1b-chat-v1.0.Q4_0.gguf")
print(llm("Hello from Python!"))
Enter fullscreen mode Exit fullscreen mode

Conclusion

Running llama.cpp natively on a Raspberry Pi is a geeky thrill. It teaches you about compiler optimizations, quantized models, and pushing hardware to the edge—literally. Bonus points if you run it headless over SSH.

Comments 0 total

    Add comment