I just found an interesting open-source tool called Ollama. It's a command-line application that lets you run Large Language Models (LLMs) on your computer. So I wanted to know more about this and tried it out on the weekend. So here I am sharing what I learned when I used Ollama.
🔍 What Is Ollama?
Ollama is a lightweight yet powerful tool that lets you run LLMs like LLaMA, Mistral, DeepSeek, Starling, and others directly on your own computer. It runs in the background and shows both:
- A Command-Line Interface (CLI) for quick management and interactions
- An API that you can use in your own programs
The Advantage?
No dependency on the cloud. No keys for the API. Just LLMs that run on your computer.
💻 Installing Ollama
It was surprisingly easy to get started:
- Visit https://ollama.com/download
- Get the OllamaSetup.exe (I use Windows OS).
- Just launch the installer.
Once installed, Ollama starts in the background, and we can run models using the CLI.
💡 Installers for Linux and macOS users can be found on the same page.
🛠️ Exploring the CLI
I opened the Windows Command Prompt (CMD) as soon as it was installed and began to explore. This is the summary of what I tried:
1. ollama
- This gives a useful usage guide with a list of all the commands and flags that are available.
2. ollama list
- This displays every model that is currently installed. If nothing appears, it simply means no models have been installed yet.
3. ollama run llama3.2:1b
- I use the
llama3.2:1b
model.
What makes me go with this model over the others? I'll explain later in this blog post. So, read till the end.
- Ollama started a chat session directly in the terminal after automatically pulling the model, which took a few seconds.
- I started conversion with a simple "hello" message. In response to my greeting, the model said:
Hello. Is there something I can help you with, or would you like to chat?
- Then I continued with the below few conversions, and the model response was accurate and well-structured.
4. Exiting the Chat /bye
- Simply type
/bye
to end a chat session:
5. ollama rm llama3.2:1b
- This command cleans up and frees up disk space.
- The model is immediately deleted from the system.
These are some of the commands I first tried with Ollama. These are only basic steps; there are a lot more things we can do with Ollama. See the Ollama GitHub repository for further information.
⁉️ Why I use llama3.2:1b
model?
I use ollama run llama3.2
after installing Ollama. For the response to the simple prompt message hello, it takes longer than 1-3 minutes to provide a basic response:
Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
What is the reason for that? Well, due to the memory limitations of my system. As per the Ollama documentation,
You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
And my system has only 4GB of RAM 😅
Since this subject is new to me, I did my research into the reasons behind these particular requirements.
This is what I discovered 👇
🔴 Main issue: Not enough RAM
- For a smooth experience, Llama 3.2 (3B parameters) requires around 6–8GB of RAM.
- The total RAM on my system is just 4GB.
- Windows 10 itself requires 2-3GB of RAM.
- As a result, the AI model has very little memory left.
🟦 What happens if there is not enough RAM?
- The computer starts to use "virtual memory" — fake RAM made from hard drive space.
- RAM is 100 times faster than hard drives.
- The model is constantly "swapped" between the hard drive and RAM, which causes a bottleneck that makes everything very slow.
So I tried to use the smaller Llama 3.2 1B model
, which works more smoothly than Llama 3.2
. I also tried running other models, but they didn't work due to the system requirements.
💬 Final Thoughts
Without depending on cloud APIs or remote inference, Ollama provides a very developer-friendly way to explore and play with LLMs. This tool is valuable if you're interested in developing local-first applications or simply want to learn how LLMs work off-cloud.
The CLI is easy to use, and the setup went smoothly in my experience. Ollama is definitely worth a try, no matter whether you're a developer developing edge-native apps or a hobbyist learning AI.
Have you tried to run LLMs locally? What models do you explore, or what are you creating with Ollama-like tools? Share your experience and leave your comments below!
Thank you for reading! ✨
Totally get the RAM struggle with local LLMs, I had a similar bottleneck running anything larger than a 3B model too.
Have you found any tricks to make chat-style workflows smoother in the CLI, or do you just keep it basic?