In this video, we run local inference on an Apple M3 MacBook with llama.cpp
Julien Simon

Julien Simon @juliensimon

About: Chief Evangelist, Arcee.ai (https://www.arcee.ai)

Joined:
Jul 31, 2018

In this video, we run local inference on an Apple M3 MacBook with llama.cpp

Publish Date: Feb 17
0 0

Local inference shootout: Llama.cpp vs. MLX on 10B and 32B Arcee SLMs

In this video, we run local inference on an Apple M3 MacBook with llama.cpp and MLX, two projects that optimize and accelerate SLMs on CPU platforms. For this purpose, we use two new Arcee open-source models distilled from DeepSeek-v3: Virtuoso Lite 10B and Virtuoso Medium v2 32B.

First, we download the two models from the Hugging Face hub with the Hugging Face CLI. Then, we go through the step-by-step installation procedure for llama.cpp and MLX. Next, we optimize and quantize the models to 4-bit precision for maximum acceleration. Finally, we run inference and look at performance numbers. So, who’s fastest? Watch and find out!

Comments 0 total

    Add comment