Building a Local-First RAG System: Gaia Node, ChromaDB, and LangChain in Action

This article explores how to create a Retrieval-Augmented Generation (RAG) system using a self-hosted Gaia Node for LLM inference and embeddings, ChromaDB as a local vector database, and LangChain for orchestrating the entire process. This "local-first" approach offers significant advantages in terms of control, privacy, and potentially cost efficiency.

The Core Problem: Beyond Static Knowledge

LLMs are incredibly powerful, but their knowledge is typically limited to their training data. For real-world applications, you often need them to access and reason over your specific, up-to-date, or private data. This is where RAG comes in.

RAG systems work by:

Retrieval: Searching a knowledge base (your documents) for information relevant to a user's query.
Augmentation: Injecting this retrieved information into the LLM's prompt.
Generation: The LLM then generates an answer based on its general knowledge and the provided context.

The key to efficient retrieval lies in vector databases and embeddings.

Our Toolkit: Gaia Node, ChromaDB, and LangChain

Let's quickly introduce the heroes of our setup:

Gaia Node: Your personal, powerful LLM and embedding server. Gaia Node provides an OpenAI-compatible API, meaning tools and libraries designed for OpenAI's API (like LangChain) can seamlessly connect to your self-hosted models. This gives you unparalleled control over the models you use and where your data resides.
ChromaDB: A lightweight, open-source vector database designed for simplicity and ease of use. It allows you to store numerical representations (embeddings) of your text data and perform rapid similarity searches. Crucially, ChromaDB supports local persistence, meaning your vector database can be saved to disk and reloaded, avoiding redundant processing.
LangChain: A versatile framework that simplifies the development of LLM-powered applications. It provides high-level abstractions for common components (LLMs, Embeddings, Document Loaders, Vector Stores) and intelligent ways to chain them together to build complex workflows like our RAG system.

Project Goal: Question Answering on Your Documents

Our goal is to build a system that can answer questions about a text document (e.g., a "State of the Union" address). We will:

Load and split the document into manageable chunks.
Convert these chunks into numerical "embeddings" using Gaia Node.
Store these embeddings in a ChromaDB vector store.
When a question is asked, retrieve the most relevant chunks from ChromaDB.
Pass these relevant chunks and the question to an LLM (also powered by Gaia Node) to generate a concise answer.

Diving into the Code (`demo.py`)

Let's walk through the Python script that orchestrates this entire process.

1. Configuration: Connecting to Gaia Node

The first step is to tell LangChain how to talk to your Gaia Node.

# Configuration for Gaia Node
GAIA_NODE_URL = "https://0x5ee30a31554672a0c213ed38e8898de84c2bb34b.gaia.domains" # Replace with your Gaia node URL
GAIA_API_KEY = "gaia" # Replace with your Gaia API key

# Set up environment variables for OpenAI-compatible API
os.environ["OPENAI_API_BASE"] = f"{GAIA_NODE_URL}/v1"
os.environ["OPENAI_API_KEY"] = GAIA_API_KEY

By setting the OPENAI_API_BASE and OPENAI_API_KEY environment variables, LangChain's OpenAIEmbeddings and ChatOpenAI classes automatically direct their requests to your specified Gaia Node URL. This makes your custom node behave just like OpenAI's API from LangChain's perspective.

2. Connection Test: Verifying Gaia Node Connectivity

Before doing anything complex, it's vital to ensure your Gaia Node is up and running and responding correctly to both embedding and LLM requests.

from langchain_openai import OpenAIEmbeddings, ChatOpenAI

def test_gaia_connection():
    print("Testing Gaia Node connection...")
    try:
        # Test embeddings
        embedding = OpenAIEmbeddings(
            base_url=os.environ["OPENAI_API_BASE"],
            api_key=os.environ["OPENAI_API_KEY"],
            model="Nomic-embed-text-v1.5" # Specify your embedding model
        )
        test_text = "This is a test sentence."
        result = embedding.embed_query(test_text)
        print(f"✓ Embeddings working - dimension: {len(result)}")

        llm = ChatOpenAI(
            base_url=os.environ["OPENAI_API_BASE"],
            api_key=os.environ["OPENAI_API_KEY"],
            model="Llama-3-Groq-8B-Tool-Use-Q5_K_M" # Specify your chat model
        )
        response = llm.invoke("Say hello!")
        # IMPORTANT: Access .content for the string output from AIMessage
        print(f"✓ LLM working - response: {response.content[:50]}...")
        return True
    except Exception as e:
        print(f"✗ Connection failed: {e}")
        return False

This function performs two critical checks:

It initializes OpenAIEmbeddings (which will use your Gaia Node) and attempts to embed a simple sentence. This verifies the embedding endpoint.
It then initializes ChatOpenAI (again, using your Gaia Node) and sends a basic "Say hello!" prompt.

3. Loading and Processing Documents: Preparing for Embeddings

Large documents need to be broken down into smaller, searchable pieces.

from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

def load_and_process_documents(file_path):
    loader = TextLoader(file_path)
    documents = loader.load()

    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000, 
        chunk_overlap=200 # Overlap helps maintain context across chunks
    )
    texts = text_splitter.split_documents(documents)
    print(f"Document split into {len(texts)} chunks")
    return texts

TextLoader simply loads the content of your document. RecursiveCharacterTextSplitter is a smart way to break down text. It tries to split based on common delimiters (like paragraphs, sentences, words) to keep meaningful chunks together. chunk_size limits the size of each piece, and chunk_overlap ensures some context from the end of one chunk is carried over to the beginning of the next, preventing information loss at split points.

4. Creating/Loading Vector Database: Storing Embeddings with ChromaDB

This is where ChromaDB comes into play, storing the numerical essence of our document chunks.

from langchain_community.vectorstores import Chroma

def create_vector_database(texts, persist_directory='gaia_chroma_db'):
    print("Creating embeddings using Gaia Node...")
    embedding = OpenAIEmbeddings( # Gaia Node generates these embeddings
        base_url=os.environ["OPENAI_API_BASE"],
        api_key=os.environ["OPENAI_API_KEY"],
        model="Nomic-embed-text-v1.5"
    )
    vectordb = Chroma.from_documents(
        documents=texts, 
        embedding=embedding, 
        persist_directory=persist_directory # Data is saved here!
    )
    print(f"Vector database created with {len(texts)} documents")
    return vectordb, embedding

def load_existing_database(persist_directory='gaia_chroma_db'):
    print(f"Loading existing database from {persist_directory}")
    # Must use the SAME embedding model as during creation
    embedding = OpenAIEmbeddings(
        base_url=os.environ["OPENAI_API_BASE"],
        api_key=os.environ["OPENAI_API_KEY"],
        model="Nomic-embed-text-v1.5"
    )
    vectordb = Chroma(
        persist_directory=persist_directory, 
        embedding_function=embedding
    )
    print("Database loaded successfully")
    return vectordb, embedding

create_vector_database: This function takes your text chunks, uses the OpenAIEmbeddings (powered by your Gaia Node) to transform each chunk into a high-dimensional vector. These vectors, along with their original text, are then stored in ChromaDB. The persist_directory argument is key for Chroma's local persistence feature – it creates a directory where your vector database will be saved.
load_existing_database: If the persist_directory already exists from a previous run, this function efficiently loads the pre-computed database from disk. This saves a lot of time by avoiding the need to re-embed all documents. It's crucial that the embedding_function provided here matches the one used during creation, so Chroma knows how to interpret and compare the stored vectors.

5. Creating QA Chain: The RAG Orchestrator

LangChain's chains simplify complex LLM workflows. Here, we use RetrievalQA for our RAG pipeline.

from langchain.chains import RetrievalQA

def create_qa_chain(vectordb):
    print("Initializing QA chain with Gaia Node...")
    # Again, use ChatOpenAI for chat models!
    llm = ChatOpenAI(
        base_url=os.environ["OPENAI_API_BASE"],
        api_key=os.environ["OPENAI_API_KEY"],
        model="Llama-3-Groq-8B-Tool-Use-Q5_K_M", # Your specific chat model
        temperature=0.7 # Controls creativity
    )
    qa = RetrievalQA.from_chain_type(
        llm=llm, 
        chain_type="stuff", # Combines all retrieved docs into one prompt
        retriever=vectordb.as_retriever(), # ChromaDB becomes our retriever
        return_source_documents=True # Get sources back with answer
    )
    print("QA chain created successfully")
    return qa

LLM Initialization: We initialize our LLM using ChatOpenAI, ensuring it correctly communicates with the Gaia Node's chat completion endpoint. This was the final piece of the puzzle that resolved the 405 errors during actual query processing.
vectordb.as_retriever(): This method turns our ChromaDB instance into a component that can perform similarity searches. When a query comes in, it asks Chroma: "Find me the document chunks that are most similar to this query."
chain_type="stuff": This is a simple strategy for RetrievalQA. It takes all the retrieved document chunks, "stuffs" them into a single context, and then passes this combined context along with the user's question to the LLM.
return_source_documents=True allows us to see which parts of the document the LLM used to formulate its answer, adding transparency.

6. Main Workflow: Putting It All Together

The main function ties everything together, handling the flow from database creation/loading to running queries.

def main():
    document_path = 'state_of_the_union.txt'
    persist_directory = 'gaia_chroma_db'

    try:
        # Check if database already exists for persistence
        if os.path.exists(persist_directory):
            print("Existing database found. Loading from disk...")
            vectordb, embedding = load_existing_database(persist_directory)
        else:
            print("Creating new database...")
            texts = load_and_process_documents(document_path)
            vectordb, embedding = create_vector_database(texts, persist_directory)
            vectordb.persist() # Note: deprecated in recent Chroma, but harmless

        qa = create_qa_chain(vectordb) # Initialize the QA chain

        queries = [
            "What did the president say about Ketanji Brown Jackson?",
            "What were the main economic points mentioned?",
            "What was said about international relations?"
        ]

        for query in queries:
            print(f"\nQuery: {query}")
            try:
                result = qa.invoke({"query": query}) 
                print(f"Answer: {result['result']}")
                if 'source_documents' in result:
                    print(f"\nSources used: {len(result['source_documents'])} documents")
                    for i, doc in enumerate(result['source_documents'][:2]):
                        print(f"Source {i+1}: {doc.page_content[:100]}...")
            except Exception as e:
                print(f"Error processing query: {e}")

        # Cleanup option
        # ... (code for deleting the database) ...

    except Exception as e:
        print(f"Error in main execution: {e}")
        print("Please check your Gaia Node configuration and ensure it's running")

The main function is the control center:

It first checks if a ChromaDB instance already exists on disk. If so, it loads it, saving time. Otherwise, it loads documents, creates new embeddings, and builds a new ChromaDB.
It then initializes the RetrievalQA chain using the created/loaded vector database.
It iterates through a list of example queries, sending each to the qa chain. Crucially, we use qa.invoke({"query": query}) here, which is the updated and recommended way to call LangChain chains, addressing a deprecation warning.
Finally, it prints the answer and the source documents used by the LLM, and provides an option to clean up the gaia_chroma_db directory.

ChromaDB Deep Dive: Your Local Vector Store

What is ChromaDB?

Chroma is an open-source vector database designed for simplicity and performance, particularly well-suited for applications involving embeddings. Think of it as a specialized database for "numerical fingerprints" (embeddings) of your data. When you have a piece of text (or an image, audio, etc.), an embedding model converts it into a long list of numbers (a vector). Chroma stores these vectors along with their original data, allowing you to quickly find other vectors that are "similar" in numerical space. This similarity often correlates to semantic similarity in the original data.

Key Features for this Project:

Local Persistence: This is a standout feature for development. When you use persist_directory='gaia_chroma_db', Chroma saves all the embeddings and their associated metadata into files on your local disk. This means that once you've run the embedding process (which can be time-consuming for large documents), you don't have to do it again unless your documents change. Subsequent runs can load the database almost instantly.
Ease of Use: Chroma's Python client is very intuitive. Integrating it with LangChain is straightforward, requiring only a few lines of code to create, load, and query your vector collections.
Open-Source & Community Driven: Being open-source provides transparency, flexibility, and a growing community for support and contributions.
Flexible Deployment: While powerful for local "in-process" use, Chroma can also be run as a separate server, allowing for more scalable deployments if your needs grow beyond a single application instance.

Why Use ChromaDB Here?

For this demo, ChromaDB is an excellent choice because:

It offers a quick and easy way to get started with vector databases without needing to set up complex infrastructure.
Its local persistence feature greatly speeds up development by avoiding repeated embedding processes.
It seamlessly integrates with LangChain's OpenAIEmbeddings, which we're powering with our custom Gaia Node. This creates a cohesive and efficient RAG pipeline entirely within your control.

By combining Gaia Node for custom LLM and embedding inference, ChromaDB for efficient local vector storage, and LangChain for seamless orchestration, we've built a robust and flexible RAG system. This setup empowers developers to create intelligent applications with greater control over their data, models, and infrastructure, moving towards a truly local-first AI development experience.

This approach is perfect for:

Experimenting with different LLMs and embedding models on your own hardware.
Building applications that require data privacy and don't want to send sensitive information to external APIs.
Reducing API costs by running models locally.

Feel free to adapt this code, swap in different documents, or experiment with other LLM and embedding models available on your Gaia Node.

Harish Kotra (he/him) @harishkotra