Running llama.cpp in Docker on Raspberry Pi
Ricardo

Ricardo @rmaurodev

About: Software Architect | Cloud Expert | MCT

Location:
US
Joined:
Oct 21, 2019

Running llama.cpp in Docker on Raspberry Pi

Publish Date: Jul 23
0 0

Running llama.cpp in Docker on Raspberry Pi

Running large language models on a Raspberry Pi isn’t just possible—it’s fun. Whether you're a hacker exploring local AI, a developer prototyping LLM workflows, or just curious about how far you can push a Pi, this tutorial is for you.

We’ll show you how to build and run llama.cpp in Docker on an ARM-based Pi to get a full LLM experience in a tiny, reproducible container. No weird dependencies. No system pollution. Just clean, fast, edge-side inference.

If you are looking for a bare metal installation on the Raspberry Pi. Check this https://rmauro.dev/running-llm-llama-cpp-natively-on-raspberry-pi/

Dockerfile

The following Dockerfile builds llama.cpp from source within an Ubuntu 22.04 base image. It includes all required dependencies and sets the container entrypoint to the compiled CLI binary.

FROM ubuntu:22.04

ENV DEBIAN_FRONTEND=noninteractive

RUN apt update && apt upgrade -y && \
    apt install -y --no-install-recommends \
    ca-certificates git build-essential cmake wget curl \
    libcurl4-openssl-dev && \
    apt clean && rm -rf /var/lib/apt/lists/*

WORKDIR /opt
RUN git clone https://github.com/ggerganov/llama.cpp.git
WORKDIR /opt/llama.cpp

RUN cmake -B build
RUN cmake --build build --config Release -j$(nproc)

WORKDIR /opt/llama.cpp/build/bin
ENTRYPOINT ["./llama-cli"]
Enter fullscreen mode Exit fullscreen mode

Build the Docker Image

Run the following command in the same directory as your Dockerfile to build the image:

docker build -t llama-cpp-pi .
Enter fullscreen mode Exit fullscreen mode

Download a Quantized Model (on Host)

You need a quantized .gguf model to perform inference. Run this command from your host system:

mkdir -p models
wget -O models/tinyllama-1.1b-chat-v1.0.Q4_0.gguf \
  https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_0.gguf
Enter fullscreen mode Exit fullscreen mode

This creates a models directory and downloads a compact version of TinyLlama suitable for edge devices.

Run Inference from Docker

Mount the models directory and run the container, specifying the model and prompt:

docker run --rm -it \
  -v $(pwd)/models:/models \
  llama-cpp-pi \
  -m /models/tinyllama-1.1b-chat-v1.0.Q4_0.gguf -p "Hello from Docker!"
Enter fullscreen mode Exit fullscreen mode

To use a different model:

MODEL=your-model-name.gguf

docker run --rm -it \
  -v $(pwd)/models:/models \
  llama-cpp-pi \
  -m /models/$MODEL -p "Hello with custom model!"
Enter fullscreen mode Exit fullscreen mode

Conclusion

This Docker-based setup enables efficient deployment of llama.cpp on ARM-based devices like the Raspberry Pi.

It abstracts away system-level configuration while preserving the flexibility to swap models, test prompts, or integrate with other AI pipelines.

For developers, researchers, and students, this is an ideal workflow to explore the capabilities of local LLM inference.

Comments 0 total

    Add comment