Ollama: How to Easily Run LLMs Locally on Your Computer
Richa Parekh

Richa Parekh @richa-parekh

About: I’m a web developer who loves turning ideas into stunning, interactive websites. I’m all about creating websites that not only look great but also work seamlessly.

Location:
India
Joined:
Nov 11, 2024

Ollama: How to Easily Run LLMs Locally on Your Computer

Publish Date: Jun 26
9 14

I just found an interesting open-source tool called Ollama. It's a command-line application that lets you run Large Language Models (LLMs) on your computer. So I wanted to know more about this and tried it out on the weekend. So here I am sharing what I learned when I used Ollama.

🔍 What Is Ollama?

Ollama is a lightweight yet powerful tool that lets you run LLMs like LLaMA, Mistral, DeepSeek, Starling, and others directly on your own computer. It runs in the background and shows both:

  • A Command-Line Interface (CLI) for quick management and interactions
  • An API that you can use in your own programs

The Advantage?

No dependency on the cloud. No keys for the API. Just LLMs that run on your computer.

💻 Installing Ollama

It was surprisingly easy to get started:

Once installed, Ollama starts in the background, and we can run models using the CLI.
💡 Installers for Linux and macOS users can be found on the same page.

🛠️ Exploring the CLI

I opened the Windows Command Prompt (CMD) as soon as it was installed and began to explore. This is the summary of what I tried:

1. ollama

  • This gives a useful usage guide with a list of all the commands and flags that are available. ollama

2. ollama list

  • This displays every model that is currently installed. If nothing appears, it simply means no models have been installed yet. ollama list

3. ollama run llama3.2:1b

  • I use the llama3.2:1b model.

What makes me go with this model over the others? I'll explain later in this blog post. So, read till the end.

  • Ollama started a chat session directly in the terminal after automatically pulling the model, which took a few seconds.
  • I started conversion with a simple "hello" message. In response to my greeting, the model said:

Hello. Is there something I can help you with, or would you like to chat?

ollama run llama3.2:1b

  • Then I continued with the below few conversions, and the model response was accurate and well-structured. other conversion

4. Exiting the Chat /bye

  • Simply type /bye to end a chat session: Exiting the Chat

5. ollama rm llama3.2:1b

  • This command cleans up and frees up disk space.
  • The model is immediately deleted from the system. ollama rm llama3.2:1b

These are some of the commands I first tried with Ollama. These are only basic steps; there are a lot more things we can do with Ollama. See the Ollama GitHub repository for further information.

⁉️ Why I use llama3.2:1b model?

I use ollama run llama3.2 after installing Ollama. For the response to the simple prompt message hello, it takes longer than 1-3 minutes to provide a basic response:
Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?

What is the reason for that? Well, due to the memory limitations of my system. As per the Ollama documentation,

You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

And my system has only 4GB of RAM 😅

Since this subject is new to me, I did my research into the reasons behind these particular requirements.

This is what I discovered 👇
🔴 Main issue: Not enough RAM

  • For a smooth experience, Llama 3.2 (3B parameters) requires around 6–8GB of RAM.
  • The total RAM on my system is just 4GB.
  • Windows 10 itself requires 2-3GB of RAM.
  • As a result, the AI model has very little memory left.

🟦 What happens if there is not enough RAM?

  • The computer starts to use "virtual memory" — fake RAM made from hard drive space.
  • RAM is 100 times faster than hard drives.
  • The model is constantly "swapped" between the hard drive and RAM, which causes a bottleneck that makes everything very slow.

So I tried to use the smaller Llama 3.2 1B model, which works more smoothly than Llama 3.2. I also tried running other models, but they didn't work due to the system requirements.

💬 Final Thoughts

Without depending on cloud APIs or remote inference, Ollama provides a very developer-friendly way to explore and play with LLMs. This tool is valuable if you're interested in developing local-first applications or simply want to learn how LLMs work off-cloud.

The CLI is easy to use, and the setup went smoothly in my experience. Ollama is definitely worth a try, no matter whether you're a developer developing edge-native apps or a hobbyist learning AI.

Have you tried to run LLMs locally? What models do you explore, or what are you creating with Ollama-like tools? Share your experience and leave your comments below!

Thank you for reading! ✨

Comments 14 total

  • Dotallio
    DotallioJun 26, 2025

    Totally get the RAM struggle with local LLMs, I had a similar bottleneck running anything larger than a 3B model too.

    Have you found any tricks to make chat-style workflows smoother in the CLI, or do you just keep it basic?

    • Richa Parekh
      Richa ParekhJun 26, 2025

      Since I'm still learning the concept and getting an understanding of how everything works, I'm sticking to the basics for now

  • Praveen Rajamani
    Praveen RajamaniJun 26, 2025

    Thanks for being clear about the hardware limits. Many people try to run local LLMs, thinking it will just work, then get frustrated when it is slow or crashes. Posts like this help save a lot of time and confusion.

    • Richa Parekh
      Richa ParekhJun 26, 2025

      Appreciate that! I'm glad the post was helpful.

  • Solve Computer Science
    Solve Computer ScienceJun 26, 2025

    Try the qwen2.5-coder model family. Yes, 4GB is insufficient to run anything useful. I'm trying with qwen2.5-coder:14b-instruct-q2_K (so low quantization and higher parameters) and it's not bad at all. The speed and quality is decent all considering. You'll need about 20GB of RAM, however. Be aware I got Chinese language only replies when running 1.5B models of that family.

    • Richa Parekh
      Richa ParekhJun 26, 2025

      Thanks for the tip! I’ll definitely check out qwen2.5-coder

  • Alexander Ertli
    Alexander ErtliJun 26, 2025

    Hey,

    Welcome to the genAI techspace.
    There is nothing wrong in using smaller models, i resort to them all the time.

    If you are interested you could try a much smaller model like smollm2:135m or qwen:0.5b they should be much more responsive with your hardware.

    Also typically Ollama tries to run models on using the GPU or at least partially if you have a compatible one.

    I hope this helps.

    • Richa Parekh
      Richa ParekhJun 27, 2025

      Yes, I will check out the smaller models. Thanks for the useful advice.

  • Nathan Tarbert
    Nathan TarbertJun 26, 2025

    This is extremely impressive, love how you documented the process and called out the RAM struggle directly. Makes me wanna try it on my old laptop now

    • Richa Parekh
      Richa ParekhJun 27, 2025

      Thank you for the appreciation. Ollama is definitely worth a try.

  • Arindam Majumder
    Arindam Majumder Jun 27, 2025

    Ollama is Great. You can also use Docker Model Runner for this

    • Richa Parekh
      Richa ParekhJun 27, 2025

      Yeah, ollama is a valuable tool. Thanks for sharing.

Add comment