Beyond Search: How to Chat with Your Documents Using AstraDB Vector Database, Docling and Granite

Hands-on vectorization and embedding with AstraDB (DataStax/IBM) vector database.

Introduction

Welcome to a hands-on exploration of building a powerful Retrieval Augmented Generation (RAG) system! In this project, we’re diving into the capabilities of DataStax’s AstraDB, a cutting-edge vector database recently acquired by IBM, to show you how to truly chat with your documents.

This post serves as a practical starting point, demonstrating the seamless integration of AstraDB with innovative tools like Docling, which transforms unstructured documents into machine-ingestable formats, and a robust Large Language Model (LLM) like Granite. Get ready to unlock new insights from your data by combining the power of vector search with advanced language understanding, all through a real-world, step-by-step experience.

AstraDB Provisionning

First things first, you need to provision an AstraDB (vectorized) database. Creating a test serverless database is free, and in few seconds you can have your database on AWS, Azure or GCP (IBM will in the list very soon as well 😁).

After database creation, you’ll have a screen with all the requirements to access it.

You can create additional tokens, and also you have the code snippets to access your database.

I also intsalled the Astra CLI, in order to manipulate the database on command line if needed.

curl -Ls "https://dtsx.io/get-astra-cli" | bash
astra --version

Test if all is OK!

astra setup --token AstraCS:xxxxx
[OK]    Configuration has been saved.
[OK]    Setup completed.
[INFO]  Enter 'astra help' to list available commands.
> astra db list
+---------------------+--------------------------------------+-----------+-------+---+-----------+
| Name                | id                                   | Regions   | Cloud | V | Status    |
+---------------------+--------------------------------------+-----------+-------+---+-----------+
| aam_test_db         | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | us-east-2 | aws   | ■ | ACTIVE    |
+---------------------+--------------------------------------+-----------+-------+---+-----------+

Implementing the code

Data Preparation

To populate our database with meaningful content, we first needed to prepare our data. For this, we leveraged Docling, a powerful tool that allowed us to ingest a PDF document and convert its complex structure into a semantically rich JSON file. This structured JSON output then served as the foundation for populating our AstraDB collection, making the document’s content accessible and searchable for our RAG system.

Prepare the environment.

python3 -m venv venv
source venv/bin/activate

pip install --upgrade pip
pip install docling

Docling conversion application ⬇️

# docling_simple_converter.py

import os
import json # Import the json module for JSON output
from docling.document_converter import DocumentConverter

def convert_documents_to_all_formats(input_folder, output_folder):
    """
    Recursively reads files from the input_folder, converts them to Markdown,
    JSON (using export_to_dict), and Doctags (using export_to_document_tokens)
    using DocumentConverter, and saves the output to the output_folder,
    maintaining the directory structure.
    """
    # Create the output folder if it doesn't exist
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)
        print(f"Created output directory: {output_folder}")

    converter = DocumentConverter()

    # Walk through the input folder recursively
    for root, _, files in os.walk(input_folder):
        # Determine the relative path from the input_folder to the current root
        relative_path = os.path.relpath(root, input_folder)
        # Construct the corresponding output directory path
        current_output_dir = os.path.join(output_folder, relative_path)

        # Create the corresponding output subdirectory if it doesn't exist
        if not os.path.exists(current_output_dir):
            os.makedirs(current_output_dir)

        for file_name in files:
            input_file_path = os.path.join(root, file_name)
            base_file_name = os.path.splitext(file_name)[0]

            # Define output paths for Markdown, JSON, and Doctags
            output_markdown_file_path = os.path.join(current_output_dir, base_file_name + ".md")
            output_json_file_path = os.path.join(current_output_dir, base_file_name + ".json")
            output_doctags_file_path = os.path.join(current_output_dir, base_file_name + ".doctags.json") # Using .json extension for doctags as they are JSON formatted

            print(f"Processing: {input_file_path}")
            try:
                # Convert the document
                doc = converter.convert(input_file_path).document

                # Export to Markdown
                markdown_output = doc.export_to_markdown()
                with open(output_markdown_file_path, 'w', encoding='utf-8') as f:
                    f.write(markdown_output)
                print(f"Successfully converted and saved Markdown to: {output_markdown_file_path}")

                # Export to JSON (using export_to_dict)
                json_dict_output = doc.export_to_dict()
                with open(output_json_file_path, 'w', encoding='utf-8') as f:
                    # Use json.dump for pretty printing JSON
                    json.dump(json_dict_output, f, indent=4, ensure_ascii=False)
                print(f"Successfully converted and saved JSON to: {output_json_file_path}")

                # Export to Doctags (using export_to_document_tokens)
                doctags_output = doc.export_to_document_tokens()
                with open(output_doctags_file_path, 'w', encoding='utf-8') as f:
                    # Doctags are also typically JSON-serializable, so dump them as JSON
                    json.dump(doctags_output, f, indent=4, ensure_ascii=False)
                print(f"Successfully converted and saved Doctags to: {output_doctags_file_path}")

            except Exception as e:
                print(f"Error converting {input_file_path}: {e}")

if __name__ == "__main__":
    # Define input and output folders
    input_folder_path = "./input"
    output_folder_path = "./output"

    convert_documents_to_all_formats(input_folder_path, output_folder_path)
    print("\nConversion process completed.")

AstraDB Implementation

Using the out-of-the-box sample provided byt AstraDB, I used the below code to connect to my database, but beforehand I made an “.env” file to store my connection information. Also at this point we need to install the additional Python libraries.

pip install "astrapy>=2.0,<3.0"
pip install python-dotenv

# .env file
API_ENDPOINT="https://xxxxxxx-us-east-2.apps.astra.datastax.com"
APPLICATION_TOKEN="AstraCS:xxxxxxxx"

# quickstart_connect.py
import os
from astrapy import DataAPIClient, Database


def connect_to_database() -> Database:
    """
    Connects to a DataStax Astra database.
    This function retrieves the database endpoint and application token from the
    environment variables `API_ENDPOINT` and `APPLICATION_TOKEN`.

    Returns:
        Database: An instance of the connected database.

    Raises:
        RuntimeError: If the environment variables `API_ENDPOINT` or
        `APPLICATION_TOKEN` are not defined.
    """
    endpoint = os.environ.get("API_ENDPOINT")
    token = os.environ.get("APPLICATION_TOKEN")

    if not token or not endpoint:
        raise RuntimeError(
            "Environment variables API_ENDPOINT and APPLICATION_TOKEN must be defined"
        )

    # Create an instance of the `DataAPIClient` class
    client = DataAPIClient()

    # Get the database specified by your endpoint and provide the token
    database = client.get_database(endpoint, token=token)

    print(f"Connected to database {database.info().name}")

    return database

Create a Collection in the Database

Although this could be done either by the CLI or by the database administration interface, I create a collection by using a sample Python application.

Given this was a first-time, hands-on experience with AstraDB, the process involved several iterations and led to the inclusion of collection dropping code for easier experimentation and setup.

Furthermore, during these adjustments, I found it necessary to employ an alternative embedding method, as I encountered challenges working with AstraDB’s built-in vectorization capabilities.

# atradb-create-collection.py

# Import the connect_to_database function from your astra_connector.py file
from astra_connector import connect_to_database
from astrapy.constants import VectorMetric
from astrapy.info import (
    CollectionDefinition,
    CollectionVectorOptions,
    VectorServiceOptions, # Keep this import, though we won't use service/provider for explicit vectors
)


def main() -> None:
    print("Attempting to connect to the database...")
    database = connect_to_database() # Establish connection using the helper function
    print("Database connection successful.")

    # IMPORTANT: Ensure this collection_name matches what you use in other scripts
    collection_name = "aam_quickstart_collection"

    # If this collection already exists, you need to drop it first
    # or this code will raise an error.
    try:
        print(f"Attempting to drop existing collection {collection_name}...")
        database.drop_collection(collection_name)
        print(f"Collection {collection_name} dropped.")
    except Exception as e:
        print(f"Could not drop collection (might not exist or error during drop): {e}")

    print(f"Attempting to create collection: {collection_name}...")
    try:
        # Define the vector dimension for the embeddings you will insert.
        # 'nomic-embed-text' produces 768-dimensional embeddings.
        VECTOR_DIMENSION = 768
        collection = database.create_collection(
            collection_name,
            definition=CollectionDefinition(
                vector=CollectionVectorOptions(
                    metric=VectorMetric.COSINE, # Set the vector similarity metric
                    dimension=VECTOR_DIMENSION, # Specify the dimension of the vectors you will insert

                )
            ),
        )
        print(f"Successfully created collection: {collection.full_name}")
        print(f"Collection '{collection_name}' is now configured for explicit vector insertion (dimension: {VECTOR_DIMENSION}).")
    except Exception as e:
        print(f"Error creating collection {collection_name}: {e}")
        print("If the collection already exists, please drop it manually or uncomment the drop_collection line in the script.")


if __name__ == "__main__":
    main() 
    print("\nCollection creation process finished.")

Populate the collection with Data

With the data prepared, the next crucial step was to populate the database within our newly created collection. This involved carefully mapping the extracted information into the AstraDB schema.

Disclaimer: As we move forward, it’s important to note that our approach for this project involves utilizing Ollama locally with the Granite model. This decision was made to provide a fully self-contained and flexible environment for experimentation, and we will now begin to detail the preparation and implementation of Ollama within our codebase.

pip install ollama

ollama pull nomic-embed-text

# quickstart_insert_to_collection.py

import json
import os
import re
import ollama # Import ollama for embeddings
from astra_connector import connect_to_database
from astrapy.data_types import DataAPIDate

# Define the Ollama embedding model to use
OLLAMA_EMBEDDING_MODEL = "nomic-embed-text" 
def sanitize_docling_data(data):
    """
    Sanitizing dictionary keys and list elements within the Docling JSON
    to remove or replace characters that are invalid for Astra DB document fields.
    Specifically, it renames keys starting with '$' to '_'.
    Result of several tests!
    """
    if isinstance(data, dict):
        new_data = {}
        for key, value in data.items():
            new_key = key
            # Only rename keys that start with '$' and are not the special '$vectorize' field
            if key.startswith('$') and key != '$vectorize':
                new_key = '_' + key[1:]
            new_data[new_key] = sanitize_docling_data(value)
        return new_data
    elif isinstance(data, list):
        return [sanitize_docling_data(element) for element in data]
    else:
        return data

def get_embedding(text: str) -> list[float]:
    """
    Generates an embedding for the given text using the specified Ollama model.
    """
    try:
        response = ollama.embeddings(model=OLLAMA_EMBEDDING_MODEL, prompt=text)
        return response['embedding']
    except ollama.ResponseError as e:
        print(f"Ollama Embedding API Error: {e}")
        print(f"Please ensure Ollama is running and model '{OLLAMA_EMBEDDING_MODEL}' is pulled.")
        return []
    except Exception as e:
        print(f"An unexpected error occurred during Ollama embedding generation: {e}")
        return []


def main() -> None:
    """
    Connects to the DataStax Astra database, retrieves a collection,
    reads a single Docling JSON file, extracts relevant metadata,
    generates explicit embeddings using Ollama, and inserts/replaces it as a concise
    document with the vector into the collection.
    """
    print("Attempting to connect to the database...")
    database = connect_to_database()
    print("Database connection successful.")

    collection_name = "aam_quickstart_collection" # Ensure this matches your collection name
    print(f"Getting collection: {collection_name}...")
    collection = database.get_collection(collection_name)
    print(f"Successfully retrieved collection: {collection.full_name}")

    # Check Ollama embedding model availability
    try:
        ollama.show(OLLAMA_EMBEDDING_MODEL)
        print(f"Ollama embedding model '{OLLAMA_EMBEDDING_MODEL}' is available locally.")
    except ollama.ResponseError as e:
        print(f"Ollama embedding model '{OLLAMA_EMBEDDING_MODEL}' not found locally.")
        print(f"Please pull it using: ollama pull {OLLAMA_EMBEDDING_MODEL}")
        return
    except Exception as e:
        print(f"Could not connect to Ollama server for embedding model: {e}")
        print("Please ensure Ollama is running on your machine.")
        return

    data_file_path = "./output/2503.11576v1.json" # <--- **UPDATE THIS PATH** with your actual file

    if not os.path.exists(data_file_path):
        print(f"Error: Docling JSON file not found at {data_file_path}")
        print("Please ensure 'PATH_TO_DOCLING_JSON_FILE' is updated to a valid Docling JSON file path.")
        return

    print(f"Reading Docling data from: {data_file_path}...")
    try:
        with open(data_file_path, "r", encoding="utf8") as file:
            docling_document_data_raw = json.load(file)
        print("Successfully loaded Docling document data.")
    except json.JSONDecodeError as e:
        print(f"Error decoding JSON from {data_file_path}: {e}")
        print("Please ensure your Docling JSON file is correctly formatted.")
        return
    except Exception as e:
        print(f"An error occurred while reading the file {data_file_path}: {e}")
        return

    print("Sanitizing Docling document data for extraction...")
    docling_document_data = sanitize_docling_data(docling_document_data_raw)
    print("Docling document data sanitized.")

    document_id = docling_document_data.get("name", os.path.splitext(os.path.basename(data_file_path))[0])

   embedding_content_parts = []

    if "texts" in docling_document_data:
        print(f"DEBUG: 'texts' list found. Length: {len(docling_document_data['texts'])}")
        for i, text_block in enumerate(docling_document_data["texts"]):
            print(f"DEBUG: Processing text_block[{i}] - Label: '{text_block.get('label')}', Has 'text' key: {'text' in text_block}, Is dict: {isinstance(text_block, dict)}")
            if isinstance(text_block, dict) and 'text' in text_block:
                text_content = text_block["text"].strip()
                print(f"DEBUG: Text content (first 100 chars): '{text_content[:100]}'")
                if text_content:
                    embedding_content_parts.append(text_content)
            else:
                print(f"DEBUG: Skipping text_block[{i}] - Not a dict or missing 'text' key. Block content: {text_block}")

            if len(" ".join(embedding_content_parts)) > 4000:
                print(f"DEBUG: Reached content length limit ({len(' '.join(embedding_content_parts))} chars). Stopping text extraction.")
                break

    else:
        print("DEBUG: 'texts' key not found in docling_document_data after sanitization. This is unexpected.")


    embedding_content = " ".join(embedding_content_parts)
    MAX_EMBEDDING_LENGTH = 8000
    if len(embedding_content) > MAX_EMBEDDING_LENGTH:
        embedding_content = embedding_content[:MAX_EMBEDDING_LENGTH] + "..."

    print(f"Content for embedding (first 500 chars): '{embedding_content[:500]}...'")
    print(f"Full content length for embedding: {len(embedding_content)} characters.")

    print(f"Generating embedding for document '{document_id}'...")
    document_embedding = get_embedding(embedding_content)

    if not document_embedding:
        print(f"Skipping insertion for '{document_id}' due to failed embedding generation.")
        return

   summary_text = ""
    if "texts" in docling_document_data:
        for text_block in docling_document_data["texts"]:
            if text_block.get("label") == "section_header" and text_block.get("text", "").strip().lower() == "abstract":
                abstract_index = docling_document_data["texts"].index(text_block)
                if abstract_index + 1 < len(docling_document_data["texts"]):
                    abstract_content_block = docling_document_data["texts"][abstract_index + 1]
                    if abstract_content_block.get("label") == "text" and "text" in abstract_content_block:
                        summary_text = abstract_content_block["text"].strip()
                        break
    if not summary_text and embedding_content:
        summary_text = embedding_content[:500] + "..." if len(embedding_content) > 500 else embedding_content


    genres = []
    due_date = None
    if "texts" in docling_document_data and len(docling_document_data["texts"]) > 0:
        header_text = docling_document_data["texts"][0].get("text", "")
        date_match = re.search(r'\d{1,2} (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{4}', header_text)
        if date_match:
            try:
                from datetime import datetime
                parsed_date = datetime.strptime(date_match.group(0), '%d %b %Y').strftime('%Y-%m-%d')
                due_date = DataAPIDate.from_string(parsed_date)
            except ValueError:
                pass

    document_to_insert = {
        "_id": document_id,
        "document_name": docling_document_data.get("name", None),
        "summary": summary_text, # This is for display
        "genres": genres,
        "due_date": due_date,
        "$vector": document_embedding # Store the explicit embedding here
    }

    print(f"Attempting to insert/replace document '{document_id}' into the collection...")
    try:
        collection.replace_one({"_id": document_id}, document_to_insert, upsert=True)
        print(f"Successfully inserted/replaced document with ID: {document_id}")
    except Exception as e:
        print(f"Error inserting/replacing document: {e}")


if __name__ == "__main__":
    main()
    print("\nDocument insertion process finished.")

The output ⬇️

python quickstart_insert_to_collection.py
Attempting to connect to the database...
Successfully connected to database: aam_test_db
Database connection successful.
Getting collection: quickstart_collection...
Successfully retrieved collection: default_keyspace.quickstart_collection
Reading Docling data from: ./output/2503.11576v1.json...
Successfully loaded Docling document data.
Sanitizing Docling document data for extraction...
Docling document data sanitized.
Attempting to insert document '2503.11576v1' into the collection...
Successfully inserted document with ID: 2503.11576v1

Just in order to verify if all is OK, run the following code ⬇️

# quickstart_find

import json
import os
from astra_connector import connect_to_database
from astrapy.data_types import DataAPIDate


def main() -> None:
    """
    Connects to the DataStax Astra database, retrieves a collection,
    and performs various queries (filtered, vector, combined) on the
    Docling document metadata.
    """
    print("Attempting to connect to the database...")
    database = connect_to_database()
    print("Database connection successful.")

    collection_name = "quickstart_collection"
    print(f"Getting collection: {collection_name}...")
    collection = database.get_collection(collection_name)
    print(f"Successfully retrieved collection: {collection.full_name}")

    print("\nFinding documents with a due_date after 2025-01-01 (if available)...")
    try:
        date_filter = DataAPIDate.from_string("2025-01-01")
        date_cursor = collection.find({"due_date": {"$gt": date_filter}})

        found_any = False
        for document in date_cursor:
            print(f"Document '{document.get('document_name', 'N/A')}' has due_date: {document.get('due_date', 'N/A')}")
            found_any = True
        if not found_any:
            print("No documents found matching the date filter.")
    except Exception as e:
        print(f"Error during filtered query: {e}")

    # --- Query 2: Perform a vector search to find the closest match to a search string ---
    # The $vectorize field in the inserted document uses 'summary' and 'document_name'.
    print("\nUsing vector search to find a document about vision-language models...")
    try:
        single_vector_match = collection.find_one(sort={"$vectorize": "a document about vision language models for document conversion"})

        if single_vector_match:
            print(f"Closest match: '{single_vector_match.get('document_name', 'N/A')}' "
                  f"with summary: '{single_vector_match.get('summary', 'N/A')[:100]}...'")
        else:
            print("No vector match found.")
    except Exception as e:
        print(f"Error during single vector search: {e}")

    # --- Query 3: Combine a filter (Python-side), vector search, and projection ---
    print("\nUsing vector search and then Python-side filtering to find 3 documents with '2503' in their name "
          "that are closest to 'AI for document understanding', returning just name and summary...")
    try:
        vector_search_results = collection.find(
            {}, # No database-side filter for document_name
            sort={"$vectorize": "AI for document understanding and information extraction"},
            limit=10, # Fetch more results to filter locally
            projection={"document_name": True, "summary": True, "_id": False}
        )

        found_documents = []
        for document in vector_search_results:
            document_name = document.get('document_name', '')
            # Apply the '2503' filter in Python
            if "2503" in document_name:
                found_documents.append(document)
            if len(found_documents) >= 3: # Limit to 3 documents
                break

        if found_documents:
            for document in found_documents:
                print(f"Found: {document.get('document_name', 'N/A')}, Summary: {document.get('summary', 'N/A')[:100]}...")
        else:
            print("No documents found matching the combined criteria after local filtering.")

    except Exception as e:
        print(f"Error during combined query: {e}")


if __name__ == "__main__":
    main()
    print("\nQuery process finished.")

python quickstart_find.py
Attempting to connect to the database...
Successfully connected to database: aam_test_db
Database connection successful.
Getting collection: quickstart_collection...
Successfully retrieved collection: default_keyspace.quickstart_collection

Finding documents with a due_date after 2025-01-01 (if available)...
No documents found matching the date filter.

Using vector search to find a document about vision-language models...
Closest match: '2503.11576v1' with summary: '...'

Using vector search and then Python-side filtering to find 3 documents with '2503' in their name that are closest to 'AI for document understanding', returning just name and summary...
Found: 2503.11576v1, Summary: ...

Query process finished.

Exploiting RAG Capacities and Chatting with the Content Using an LLM

At this stage, we can implement a chat application using Granite 3.3 with Ollama locally. This application will connect directly to our RAG system, leveraging AstraDB to retrieve relevant document content, and then use the LLM to answer user questions based on that retrieved information.

# rag_chat_app.py

import os
import ollama
from astra_connector import connect_to_database

OLLAMA_EMBEDDING_MODEL = "nomic-embed-text" 
OLLAMA_LLM_MODEL = "granite3.3" 

def get_embedding(text: str) -> list[float]:
    """
    Generates an embedding for the given text using the specified Ollama model.
    """
    try:
        response = ollama.embeddings(model=OLLAMA_EMBEDDING_MODEL, prompt=text)
        return response['embedding']
    except ollama.ResponseError as e:
        print(f"Ollama Embedding API Error: {e}")
        print(f"Please ensure Ollama is running and model '{OLLAMA_EMBEDDING_MODEL}' is pulled.")
        return []
    except Exception as e:
        print(f"An unexpected error occurred during Ollama embedding generation: {e}")
        return []

# Function to retrieve relevant documents from Astra DB using vector search
def get_relevant_documents(query_text: str, collection, limit: int = 3) -> str:
    """
    Performs a vector search on the Astra DB collection using an explicit query vector
    to find documents semantically similar to the query text.

    Args:
        query_text: The user's query string.
        collection: The Astra DB collection object.
        limit: The maximum number of relevant documents to retrieve.

    Returns:
        A concatenated string of summaries from the retrieved documents,
        or an empty string if no relevant documents are found.
    """
    print(f"\nSearching Astra DB for relevant documents for: '{query_text}'...")
    query_embedding = get_embedding(query_text)

    if not query_embedding:
        print("Failed to generate embedding for the query. Cannot perform vector search.")
        return ""

    try:
        # Perform vector search using the $vector field and the explicit query_embedding
        results_cursor = collection.find(
            {}, # No additional filters
            sort={"$vector": query_embedding}, # <--- CORRECTED: Use $vector with the explicit embedding
            limit=limit,
            projection={"document_name": True, "summary": True} # Retrieve name and summary
        )

        relevant_texts = []
        for i, doc in enumerate(results_cursor):
            doc_name = doc.get('document_name', 'Unnamed Document')
            summary = doc.get('summary', 'No summary available.')
            relevant_texts.append(f"Document {i+1} ({doc_name}): {summary}")

        if relevant_texts:
            print(f"Found {len(relevant_texts)} relevant documents.")
            return "\n".join(relevant_texts)
        else:
            print("No relevant documents found in Astra DB.")
            return ""
    except Exception as e:
        print(f"Error during Astra DB vector search: {e}")
        return ""

# Function to generate a response using Ollama LLM
def generate_response(prompt: str, model_name: str = OLLAMA_LLM_MODEL) -> str:
    """
    Generates a response using the specified Ollama LLM.
    """
    print(f"\nGenerating response with Ollama LLM '{model_name}'...")
    try:
        response = ollama.chat(model=model_name, messages=[
            {'role': 'user', 'content': prompt},
        ])
        return response['message']['content']
    except ollama.ResponseError as e:
        print(f"Ollama API Error: {e}")
        print("Please ensure Ollama is running and the LLM is available.")
        return "Sorry, I couldn't connect to the Ollama model or there was an issue with the response."
    except Exception as e:
        print(f"An unexpected error occurred during Ollama generation: {e}")
        return "An unexpected error occurred while generating the response."

def main() -> None:
    """
    Main function for the RAG chat application.
    Connects to Astra DB and provides a chat interface to query the database
    and get responses from Ollama.
    """
    print("Starting Astra DB RAG Chat Application...")

    # 1. Connect to Astra DB
    try:
        database = connect_to_database()
        # <--- CORRECTED: Ensure collection name matches your setup
        collection = database.get_collection("aam_quickstart_collection")
        print("Successfully connected to Astra DB collection.")
    except Exception as e:
        print(f"Failed to connect to Astra DB: {e}")
        print("Please ensure your Astra DB environment variables are set and the collection exists.")
        return

    # 2. Check Ollama models availability
    try:
        ollama.show(OLLAMA_EMBEDDING_MODEL)
        print(f"Ollama embedding model '{OLLAMA_EMBEDDING_MODEL}' is available locally.")
    except ollama.ResponseError as e:
        print(f"Ollama embedding model '{OLLAMA_EMBEDDING_MODEL}' not found locally.")
        print(f"Please pull it using: ollama pull {OLLAMA_EMBEDDING_MODEL}")
        return
    except Exception as e:
        print(f"Could not connect to Ollama server for embedding model: {e}")
        print("Please ensure Ollama is running on your machine.")
        return

    try:
        ollama.show(OLLAMA_LLM_MODEL)
        print(f"Ollama LLM '{OLLAMA_LLM_MODEL}' is available locally.")
    except ollama.ResponseError as e:
        print(f"Ollama LLM '{OLLAMA_LLM_MODEL}' not found locally.")
        print(f"Please pull it using: ollama pull {OLLAMA_LLM_MODEL}")
        return
    except Exception as e:
        print(f"Could not connect to Ollama server for LLM: {e}")
        print("Please ensure Ollama is running on your machine.")
        return


    print("\nType your queries. Type 'exit' to quit.")

    # 3. Start the chat loop
    while True:
        user_query = input("\nYour query: ")
        if user_query.lower() == 'exit':
            print("Exiting chat application. Goodbye!")
            break

        # Retrieve relevant documents from AstraDB
        context = get_relevant_documents(user_query, collection)

        if not context:
            print("Could not retrieve relevant context from the database. Trying to answer without context.")
            llm_prompt = user_query
        else:
            llm_prompt = (
                "You are an AI assistant specialized in document understanding. "
                "Use the following document excerpts to answer the user's question. "
                "If the information is not in the provided excerpts, state that you cannot answer based on the given context.\n\n"
                f"Document Excerpts:\n{context}\n\n"
                f"Question: {user_query}\n"
                "Answer:"
            )

        response = generate_response(llm_prompt, OLLAMA_LLM_MODEL)
        print(f"\nAI Response: {response}")

if __name__ == "__main__":
    main()

And here is the result of the sample chat/test ⬇️

python rag_chat_app.py
Starting Astra DB RAG Chat Application...
Successfully connected to database: aam_test_db
Successfully connected to Astra DB collection.
Ollama embedding model 'nomic-embed-text' is available locally.
Ollama LLM 'granite3.3' is available locally.

Type your queries. Type 'exit' to quit.

Your query: what is smoldocling

Searching Astra DB for relevant documents for: 'what is smoldocling'...
Found 1 relevant documents.

Generating response with Ollama LLM 'granite3.3'...

AI Response: SmolDocling is an ultra-compact vision-language model designed for end-to-end document conversion. It generates DocTags, a new universal markup format that captures all page elements in their full context with location. Unlike other models, SmolDocling offers an end-to-end conversion for accurately capturing content, structure, and spatial location of document elements in a 256M parameters vision-language model. It demonstrates robust performance in reproducing various document features across diverse document types and comes with novel publicly sourced datasets for charts, tables, equations, and code recognition. SmolDocling competes with larger models while significantly reducing computational requirements. The model is currently available, and the datasets will be made publicly available soon.

Your query: exit
Exiting chat application. Goodbye!

Conclusion

In conclusion, this project has guided us through the essential steps of setting up a functional RAG system. From transforming raw documents with Docling, to efficiently populating and querying a vector-enabled AstraDB, and finally integrating a local Ollama instance with Granite for intelligent conversational AI, we’ve demonstrated a complete pipeline for interacting with your data. This hands-on journey underscores the power of combining specialized tools to build sophisticated, context-aware applications.

Thanks for reading! 🤟

Useful links

AstraDB: https://www.datastax.com/products/datastax-astra
astrapy: https://github.com/datastax/astrapy
astra-cli: https://github.com/datastax/astra-cli
Docling: https://docling-project.github.io/
Manage Astra DB with the Astra CLI: https://docs.datastax.com/en/astra-cli/managing.html

Alain Airom @aairom