🚀 Setting Up Ollama & Running DeepSeek R1 Locally for a Powerful RAG System
Ajmal Hasan

Ajmal Hasan @ajmal_hasan

About: Website: https://my-portfolio-website-tey7.vercel.app/

Location:
India
Joined:
Apr 29, 2020

🚀 Setting Up Ollama & Running DeepSeek R1 Locally for a Powerful RAG System

Publish Date: Jan 28
1024 37

🤖 Ollama

Ollama is a framework for running large language models (LLMs) locally on your machine. It lets you download, run, and interact with AI models without needing cloud-based APIs.

🔹 Example: ollama run deepseek-r1:1.5b – Runs DeepSeek R1 locally.

🔹 Why use it? Free, private, fast, and works offline.


🔗 LangChain

LangChain is a Python/JS framework for building AI-powered applications by integrating LLMs with data sources, APIs, and memory.

🔹 Why use it? It helps connect LLMs to real-world applications like chatbots, document processing, and RAG.


📄 RAG (Retrieval-Augmented Generation)

RAG is an AI technique that retrieves external data (e.g., PDFs, databases) and augments the LLM’s response.

🔹 Why use it? Improves accuracy and reduces hallucinations by referencing actual documents.

🔹 Example: AI-powered PDF Q&A system that fetches relevant document content before generating answers.


⚡ DeepSeek R1

DeepSeek R1 is an open-source AI model optimized for reasoning, problem-solving, and factual retrieval.

🔹 Why use it? Strong logical capabilities, great for RAG applications, and can be run locally with Ollama.


🚀 How They Work Together?

  • Ollama runs DeepSeek R1 locally.
  • LangChain connects the AI model to external data.
  • RAG enhances responses by retrieving relevant information.
  • DeepSeek R1 generates high-quality answers.

💡 Example Use Case: A Q&A system that allows users to upload a PDF and ask questions about it, powered by DeepSeek R1 + RAG + LangChain on Ollama! 🚀


🎯 Why Run DeepSeek R1 Locally?

Benefit Cloud-Based Models Local DeepSeek R1
Privacy ❌ Data sent to external servers ✅ 100% Local & Secure
Speed ⏳ API latency & network delays ⚡ Instant inference
Cost 💰 Pay per API request 🆓 Free after setup
Customization ❌ Limited fine-tuning ✅ Full model control
Deployment 🌍 Cloud-dependent 🔥 Works offline & on-premises

🛠 Step 1: Installing Ollama

🔹 Download Ollama

Ollama is available for macOS, Linux, and Windows. Follow these steps to install it:

1️⃣ Go to the official Ollama download page

🔗 Download Ollama

2️⃣ Select your operating system (macOS, Linux, Windows)

3️⃣ Click on the Download button

4️⃣ Install it following the system-specific instructions

📸 Screenshot:

Image description

Image description


🛠 Step 2: Running DeepSeek R1 on Ollama

Once Ollama is installed, you can run DeepSeek R1 models.

🔹 Pull the DeepSeek R1 Model

To pull the DeepSeek R1 (1.5B parameter model), run:

ollama pull deepseek-r1:1.5b
Enter fullscreen mode Exit fullscreen mode

This will download and set up the DeepSeek R1 model.

🔹 Running DeepSeek R1

Once the model is downloaded, you can interact with it by running:

ollama run deepseek-r1:1.5b
Enter fullscreen mode Exit fullscreen mode

It will initialize the model and allow you to send queries.

📸 Screenshot:

Image description


🛠 Step 3: Setting Up a RAG System Using Streamlit

Now that you have DeepSeek R1 running, let's integrate it into a retrieval-augmented generation (RAG) system using Streamlit.

🔹 Prerequisites

Before running the RAG system, make sure you have:

  • Python installed
  • Conda environment (Recommended for package management)
  • Required Python packages
pip install -U langchain langchain-community
pip install streamlit
pip install pdfplumber
pip install semantic-chunkers
pip install open-text-embeddings
pip install faiss
pip install ollama
pip install prompt-template
pip install langchain
pip install langchain_experimental
pip install sentence-transformers
pip install faiss-cpu
Enter fullscreen mode Exit fullscreen mode

For detailed setup, follow this guide:

🔗 Setting Up a Conda Environment for Python Projects


🛠 Step 4: Running the RAG System

🔹 Clone or Create the Project

1️⃣ Create a new project directory

mkdir rag-system && cd rag-system
Enter fullscreen mode Exit fullscreen mode

2️⃣ Create a Python script (app.py)
Paste the following Streamlit-based script:

import streamlit as st
from langchain_community.document_loaders import PDFPlumberLoader
from langchain_experimental.text_splitter import SemanticChunker
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.llms import Ollama
from langchain.prompts import PromptTemplate
from langchain.chains.llm import LLMChain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains import RetrievalQA

# Streamlit UI
st.title("📄 RAG System with DeepSeek R1 & Ollama")

uploaded_file = st.file_uploader("Upload your PDF file here", type="pdf")

if uploaded_file:
    with open("temp.pdf", "wb") as f:
        f.write(uploaded_file.getvalue())

    loader = PDFPlumberLoader("temp.pdf")
    docs = loader.load()

    text_splitter = SemanticChunker(HuggingFaceEmbeddings())
    documents = text_splitter.split_documents(docs)

    embedder = HuggingFaceEmbeddings()
    vector = FAISS.from_documents(documents, embedder)
    retriever = vector.as_retriever(search_type="similarity", search_kwargs={"k": 3})

    llm = Ollama(model="deepseek-r1:1.5b")

    prompt = """
    Use the following context to answer the question.
    Context: {context}
    Question: {question}
    Answer:"""

    QA_PROMPT = PromptTemplate.from_template(prompt)

    llm_chain = LLMChain(llm=llm, prompt=QA_PROMPT)
    combine_documents_chain = StuffDocumentsChain(llm_chain=llm_chain, document_variable_name="context")

    qa = RetrievalQA(combine_documents_chain=combine_documents_chain, retriever=retriever)

    user_input = st.text_input("Ask a question about your document:")

    if user_input:
        response = qa(user_input)["result"]
        st.write("**Response:**")
        st.write(response)
Enter fullscreen mode Exit fullscreen mode

🛠 Step 5: Running the App

Once the script is ready, start your Streamlit app:

streamlit run app.py
Enter fullscreen mode Exit fullscreen mode

📸 Screenshot:

Image description

CHECK GITHUB REPO FOR COMPLETE CODE
LEARN BASICS HERE


🎯 Final Thoughts

You have successfully set up Ollama and DeepSeek R1!

You can now build AI-powered RAG applications with local LLMs!

Try uploading PDFs and asking questions dynamically.

💡 Want to learn more? Follow my Dev.to blog for more development tutorials! 🚀

Comments 37 total

  • Frulow
    FrulowJan 29, 2025

    Would have been better if you mentioned system requirements too

    • thunderduck eu
      thunderduck euJan 30, 2025

      It’s a 1gb file. Llm’s like to sit in your gpu. So a 2gb graphics card should run it. Obviously it will not be as fast as a 4060 8gb with lots of cuda cores. But if you read other articles about this llm it’s designed to work on less resources

    • Terry W
      Terry WJan 31, 2025

      Also, the actual R1 model is the biggest one, anything less than the 400+gb model are distilled models of it.. but they are of course near enough the same thing anyway

    • Rohan Srivastava
      Rohan SrivastavaJan 31, 2025

      Yes even on linux computation even for 12 MB file the training and vectorization tooks almost 20 mins even the architecture is of r5a.4xlarge ec2 machine.

  • Futuritous
    FuturitousJan 29, 2025

    My Laptop has 4 CPU cores, 16GB RAM with Intel integrated Graphics (Ubuntu) - will it work on my Laptop?

    • Abraham
      AbrahamJan 29, 2025

      Yes, but not as fast as if you had a GPU. You also will need to use 7B or smaller model.

    • thunderduck eu
      thunderduck euJan 30, 2025

      Try it. It’s a light weight model.

  • Futuritous
    FuturitousJan 29, 2025

    Would love to try it.

  • יובל שמעוני
    יובל שמעוניJan 29, 2025

    Can you share a TypeScript version of that?

  • SAMIR HEMBROM
    SAMIR HEMBROMJan 29, 2025

    I tried running it dunno why but it gave me garbage text back

    • Ajmal Hasan
      Ajmal HasanJan 29, 2025

      Use higher parameters version if your system supports it.

      • SAMIR HEMBROM
        SAMIR HEMBROMJan 29, 2025

        Sadly I don't think I can I have 8gb ram

        • thunderduck eu
          thunderduck euJan 30, 2025

          It’s a small model. And will rely on your gpu. 2gb of gpu power will be enough to get started. Obviously it won’t be as fast if you have a more modern card. I use a 4060 with 8gb of ram. Mainly because it has a lot of cuda cores and uses way less electricity.

  • maneamarius
    maneamariusJan 29, 2025

    What are the hardware's requirements?
    Why not starting with this, at the beginning of your guide?

    • Ajmal Hasan
      Ajmal HasanJan 29, 2025

      Any decent system will suffice (for example, I use a MacBook M1 base model). Choose the light model available, if not having high end device.

      However, keep in mind that processing time and response quality will vary based on your system's specifications and the complexity of the model parameters. 🚀

      • maneamarius
        maneamariusJan 29, 2025

        Not a good answer.
        You should put the recommended system requirements in your post, for each model.
        e.g. graphics cards needed, etc..
        Otherwise your post is incomplete.

        • Marcus Franke
          Marcus FrankeJan 30, 2025

          What about giving it a try before criticizing the author?

          As mentioned, the 1.5b model is rather small. The download is "just" 1.1 Gigabyte. I was able to run it on a MacBook Pro 2 with only 16GB of RAM, and it was answering with decent speed consuming about 4G RAM usage.

          The real limitation is the 1.5b model. I asked it to generate Rust code, and it admitted to not knowing it very well.

          I then switched to the deepseek-coder-v2 model with 16b parameters, and that's a download of 8.9 Gigabytes. RAM usage spiked to 8G, and the model is operating at a lower speed and uses less reasoning but instead started to emit code directly to my question.

          So, Ajmal's answer is that a decent system will be enough to generate your answers. I agree with this, as I would consider my Mac, due to RAM limitations, not as good, but decent. And, of course, it depends on what you are running besides the LLM. If your RAM is already filled up, you'll get into trouble.

          However, you do not need a 4090 and many Tensor Cores to run these models locally. Your mileage may vary, true. But overall, and to get a first impression, it will definitely work.

          Just give it a try, the text shows all the necessary steps to do this. Except for ollama serve you will find out by looking at the messages and the help.

      • squidbe
        squidbeJan 30, 2025

        @maneamarius , Asking what the system requirements should be for an LLM is like asking what the horsepower should be for a car: It depends. We're talking about tools with a wide range of applications, so the minimum requirements depend on an individual's desired outcomes.

        As they say, you attract more flies with honey than vinegar. Instead of criticizing a guy who's educating you and others for free, try asking him something like, "What are your system specs, and how many tokens per second are you getting?"

    • Shardul Vikram Singh
      Shardul Vikram SinghJan 30, 2025

      Image description
      I found this rule of thumb in a youtube video by bycloud
      If your gpu's vram is greater than (model_size * 1.2) then you can run that model

  • Naseer Ahmad
    Naseer AhmadJan 29, 2025

    What if I want to use UTF8 txt files?

  • Ataliba Miguel
    Ataliba MiguelJan 29, 2025

    Hi @ajmal_hasan, how to get around from the error: requests.exceptions.SSLError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /sentence-transformers/all-mpnet-base-v2/resolve/main/adapter_config.json (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (ssl.c:997)')))"), '(Request ID: edeffbec-e8a2-472e-9722-2c40df75aa94)')
    2025-01-29 21:55:58.668 Examining the path of torch.classes raised: Tried to instantiate class '
    path._path', but it does not exist! Ensure that it is registered via torch::class

  • OaKiToKi
    OaKiToKiJan 29, 2025

    Just wanted to confirm what specs it can run -
    Ollama DeepSeekR1:14B runs smoothly and quickly on an Ryzen 7 5700x, 64GB, 3080RTX 10GB. The 32B and 70B run but the 70B thinks 1 word a second while the 32B is slightly faster.

    I've used the 70B but had to let it run to provide info the next day (late at night). Just fyi if time is of no issue it will run Ollama and even the chatapp. Have not tried RAG but shouldn't be an issue.

  • Paul Levitt
    Paul LevittJan 30, 2025

    I’d double check your claim of DeepSeek R1 local deployments being “✅ 100% Local & Secure” - wouldn’t be the first to reach out to the wider net.

    I caveat this with; you are however 100% in control of a local model’s resource access.

    My apologies if this is what you meant; not explicitly called out so wasn’t aware

  • Mohamed Wajeeth
    Mohamed WajeethJan 30, 2025

    Thank you!

  • Leo Calle
    Leo CalleJan 30, 2025

    thank you @ajmal_hasan for Sharing ,will give it a try 😀

  • Veerakumar
    VeerakumarJan 30, 2025

    bro really thank for making this tut bro.

  • SHRIRAMPRABHU J
    SHRIRAMPRABHU JJan 31, 2025

    Hi, I have followed the above process, however I get this error - An error occurred: Ollama call failed with status code 500. Details: {"error":"llama runner process has terminated: exit status 2"}

    Can someone please assist me?

  • Abel C Dixon
    Abel C DixonJan 31, 2025

    How can I enable support for image inference

  • Suman Tandukar
    Suman TandukarJan 31, 2025

    I am struggling to download even 1.5b module. I tried all of them except 671b. it always reset at some point and restart until it gives me too much retries. Is someone facing the same issue?

  • frankDev96
    frankDev96Jan 31, 2025

    Does this works on windows and mac machine

  • АнонимJan 31, 2025

    [deleted]

  • Sv College
    Sv CollegeFeb 4, 2025

    i installed deepseek-r1:1.5b in my local machine , when i run ollama run deepseek-r1:1.5b it starts successfully but it's not giving response to my question , it's just give blank and asking to give another question in next line. what would be cause , and could you suggest a way to solve this issue

  • John O'Donahue
    John O'DonahueFeb 10, 2025

    Great article. Surprised how well it works locally. Thank you.

  • R. Félix Bengolea Lalor
    R. Félix Bengolea LalorFeb 11, 2025

    Hi, I have a problem deploying it: "Unable to deploy
    The app’s code is not connected to a remote GitHub repository. To deploy on Streamlit Community Cloud, please put your code in a GitHub repository and publish the current branch. Read more in our documentation."
    Is this necesary or is something Im doing wrong?
    Thanks!

  • Will H.
    Will H.Mar 7, 2025

    In your app.py, llm = Ollama(model="deepseek-r1:1.5b") is this the call using the locally installed model at all?

    • Ajmal Hasan
      Ajmal HasanMar 7, 2025

      Yes, but first local llm must be running in background using ollama

  • Will H.
    Will H.Mar 7, 2025

    Your article was impressive, I was able to follow and got my local installed 14b DSr1 running and answering questions running on my 3080 GPU, I can definitely tell because this GPU was running hard. Some feedback here:

    1. Some comments in app.py would be nice to newbies.
    2. pip install faiss-cpu is working for me, not pip install faiss-gpu, perhaps this is why the PDF training was slow?
    3. Some deprecated function calls in your app.py on the langchain model, I am sure you are aware of them... Overall, I did a ipconfig /release to my PC, was able to run your PDF and the Ollama 14b model offline, very impressive...
Add comment