LocalLLMClient: A Swift Package for Local LLMs Using llama.cpp and MLX
Publish Date: May 15
6 1
LocalLLMClient: A Swift package for Local LLMs
Hi!
Recently, Large Language Models (LLMs) that can run locally on your device have been gaining a lot of attention. As someone who frequently works with Apple platforms, I wanted an easy way to use various local LLMs from Swift. After researching existing solutions and not finding one that fit my needs, I decided to create my own.
In this article, I'm introducing LocalLLMClient, a library that makes it simple to use local LLMs from Swift. Of course, you can use it in both macOS and iOS apps!
Uses llama.cpp and Apple MLX under the hood, with the same interface for both backends
iOS/macOS Support
Runs on both iPhones and Macs
Streaming API
Leverages Swift Concurrency for a nice streaming experience
Multimodal Support
Handles not just text but images as well
I built this because I often find myself in situations where I want to use MLX for its faster performance, but still need llama.cpp for newer models that MLX doesn't support yet.
The library also supports VLMs (Visual Language Models) from both backends, allowing you to ask questions like "What's in this photo?" Even on iOS, the Qwen 2.5 VL 3B 4bit model just barely runs on an iPhone 16 Pro.
The library is provided as a Swift Package. It's modularized so you can import only what you need:
LocalLLMClient: Common interfaces
LocalLLMClientLlama: llama.cpp backend
LocalLLMClientMLX: Apple MLX backend
LocalLLMClientUtility: Utilities like LLM model downloaders
Text Generation
Here's a simple example:
importLocalLLMClientimportLocalLLMClientLlamaimportLocalLLMClientUtility// Download the model (e.g., Gemma 3)letggufName="gemma-3-4B-it-QAT-Q4_0.gguf"letdownloader=FileDownloader(source:.huggingFace(id:"lmstudio-community/gemma-3-4B-it-qat-GGUF",globs:[ggufName]))tryawaitdownloader.download{progressinprint("Download progress: \(progress)")}// Initialize the clientletmodelURL=downloader.destination.appending(component:ggufName)letclient=tryawaitLocalLLMClient.llama(url:modelURL,parameter:.init(context:4096,// Text context sizetemperature:0.7,// Randomness (0.0-1.0)topK:40,// Top-K samplingtopP:0.9,// Top-P (nucleus) samplingoptions:.init(responseFormat:.json)// Response format))letprompt="""
Create the opening of an epic story where a cat is the protagonist.
Format as JSON like this:
{
"title": "<title>",
"content": "<content>",
}
"""// Generate textletinput=LLMInput.chat([.system("You are a helpful assistant."),.user(prompt)])fortryawaittextintryawaitclient.textStream(from:input){print(text,terminator:"")}
Here's an example result:
{"title":"Shadow of the Moon's Claw","content":"Long ago, when humans still dreamed of stars, the world was ruled by cats. With their intelligence and grace, cats governed kingdoms across the land, keeping humans as their adorable pets. But even in the world of cats, conspiracies and secrets swirled. The 'Moon Shadow Cats,' a lineage dominated by powerful magic, plotted to use their powers to control the world. The protagonist, Mika, decides to embark on a journey to stop this conspiracy. She meets a legendary cat sage and acquires ancient knowledge and magical powers. However, Moon Shadow Cats' pursuers relentlessly chase Mika, starting a battle that will shake the world where cats and humans coexist. To stop the Moon Shadow Cats' conspiracy, Mika must accept her destiny and set off on an epic adventure to save the world."}
Multimodal
Here's an example that includes an image input:
importLocalLLMClientimportLocalLLMClientMLXimportLocalLLMClientUtility// Download the model files (e.g., Qwen2.5 VL 3B)letdownloader=FileDownloader(source:.huggingFace(id:"mlx-community/Qwen2.5-VL-3B-Instruct-abliterated-4bit",globs:.mlx))tryawaitdownloader.download{progressinprint("Download progress: \(progress)")}letclient=tryawaitLocalLLMClient.mlx(url:downloader.destination)// Create input with an imageletinput=LLMInput.chat([.user("Describe what's in this photo as a song",attachments:[.image(<image>)]),])// Get the text all at once without streamingprint(tryawaitclient.generateText(from:input))
When I tested with a photo of a stone angel statue, I got this result:
Lyrics for this photo:
Wings folded on the tombstone
In the distant beyond
Binding love
Where wishes dwell
This song expresses the image of an angel with folded wings on a tombstone. It represents the wishes dwelling on the tombstone that connect to a distant love. The image portrays an angel figure carved on a gravestone with wishes dwelling within it.
Additional Features
The FileDownloader in LocalLLMClientUtility includes features like skipping downloads for models that are already stored and background downloading capability that allows iOS apps to continue downloading models even when the app is in the background..
The Apple platforms I usually work with have a philosophy of prioritizing privacy and trying to run things on-device as much as possible, which I strongly support. Also, as someone who's afraid of accidentally burning through money when playing with AI services, I'm incredibly grateful to everyone working on developing, providing, and utilizing local LLMs. Thank you!
I'd be happy if LocalLLMClient becomes one of your options when you want to play with AI in Swift.
Great article! I haven’t been using local LLMs, but it looks like I need to try them, they seem really helpful even without an internet connection!!