Refactoring RAG PDFBot: Modular Design with LangChain, Streamlit and ChromaDB
Zarrar Shaikh

Zarrar Shaikh @zlash65

About: Backend, Infra & Fullstack Engineer | Python, FastAPI, Terraform, AWS, AIOps, MLOps | Built systems at scale for unicorns & early-stage startups | Loves clean, reliable code

Location:
Toronto, ON
Joined:
Jul 2, 2025

Refactoring RAG PDFBot: Modular Design with LangChain, Streamlit and ChromaDB

Publish Date: Jul 5
0 0

🔗 If you're new to this project, start with the original guide here: Building a RAG-powered PDF Chatbot


In real-world production systems, it’s common practice to split responsibilities into multiple well-defined modules. Instead of cramming everything into a single file, code is grouped based on functionality - making it easier to debug, scale, and maintain. In this version of the RAG PDFBot, we’re simulating that same structure.

In this post we will walk through how we can evolve our original chatbot into a modular, production-style app using LangChain, ChromaDB, and Streamlit.

Working demo

👆 Here's a quick look at what you'll be building in this guide.

📦 Source Code: Zlash65/rag-bot-chroma


🧱 What's New in the Modular Version

Area Original Version Modular Edition
File Structure One big app.py file Multiple logical modules (chat, sidebar, LLM, PDF, vectorstore, config)
PDF Parser PyPDF2 Switched to pypdf
Embedding Store FAISS Switched to ChromaDB (for learning & experimentation)
LLM Chains Simple load_qa_chain LangChain RetrievalChain with structured prompts
Prompting Static prompt template Modular prompt with system/human roles
Dev Tools None Built-in vectorstore inspector

🔁 From FAISS to ChromaDB

Both FAISS and ChromaDB are popular options for storing and searching vector embeddings.

⚠️ In this version, we're switching to ChromaDB - not because FAISS isn't good, but to experiment with a different vector database and learn its tradeoffs.

Feature FAISS ChromaDB
Persistence In-memory by default (requires manual saving/loading with .save_local() / .load_local()) Persistent by default (creates chroma directory and auto-saves)
Setup Complexity Simple for in-memory; more manual steps for persistence Plug-and-play with auto-persistence
Metadata Support Stores metadata, but querying/filtering support is limited Rich metadata filtering and querying support
Built-in Filtering Minimal (not intuitive for metadata-based filtering) Native filtering with conditions on metadata
Performance Highly optimized for similarity search at scale (especially with GPU) Good performance, but not optimized for billion-scale datasets
Indexing Options Multiple indexing algorithms (Flat, IVF, HNSW, etc.) Abstracted away - we don't control indexing

Use FAISS if:

  • You want high performance similarity search.
  • You’re comfortable managing manual persistence.
  • You’re deploying on-device or at scale, especially with GPU acceleration.

Use ChromaDB if:

  • You want auto-persistence with minimal setup.
  • You need metadata filtering (e.g., retrieve only documents from a specific source).
  • You're in rapid prototyping mode and want a simple dev experience.

Code Snippet: ChromaDB Setup

from langchain.vectorstores import Chroma

def create_chroma_vectorstore(chunks, embedding):
    vectorstore = Chroma.from_texts(
        texts=chunks,
        embedding=embedding,
        persist_directory="./data/chroma_store"
    )
    return vectorstore
Enter fullscreen mode Exit fullscreen mode

🔍 VectorStore Inspector

One of the highlights of this version is we added a vectorstore inspector.

In the previous version, the vectorstore was a black box. Now, we can:

  • Run ad-hoc test queries
  • See matching chunks returned from Chroma
  • Visually debug which documents were used for answering

Code Snippet: Vector Inspector

def inspect_vectorstore(vs):
    st.subheader("🔬 Vectorstore Inspector")
    query = st.text_input("Enter a test query")
    if query:
        results = vs.similarity_search(query)
        for i, doc in enumerate(results):
            st.markdown(f"**Result {i+1}**")
            st.code(doc.page_content.strip())
Enter fullscreen mode Exit fullscreen mode

Example:

VectorStore Inspector


🧠 Improved Prompt & Chain Logic

In the original version, we used:

load_qa_chain(llm, chain_type="stuff", prompt=...)
Enter fullscreen mode Exit fullscreen mode

That worked - but now we’re using LangChain's RetrievalQA with a cleaner, modular prompt built using ChatPromptTemplate.

Code Snippet: New Chain Logic

from langchain.chains import RetrievalQA
from langchain.prompts import ChatPromptTemplate

def get_qa_chain(llm, retriever):
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a helpful assistant. Use the provided context to answer."),
        ("human", "{question}")
    ])
    return RetrievalQA.from_chain_type(
        llm=llm,
        retriever=retriever,
        chain_type="stuff",
        chain_type_kwargs={"prompt": prompt}
    )
Enter fullscreen mode Exit fullscreen mode

Why it’s better:

  • We separate system and user roles clearly
  • It's easier to extend for follow-up questions or history
  • It aligns better with modern LLM chat paradigms

🧩 UI & Handler Logic: Cleaner, Separated, Smarter

The user interface behavior is mostly the same - but under the hood, it's been broken into logical handlers:

File Role
sidebar_handler.py Handles model selection, API key input, PDF upload, and utility buttons
chat_handler.py Handles rendering chat bubbles, input box, and chat history download
llm_handler.py Manages chain and prompt setup for different model providers
vectorstore_handler.py Embeds and stores PDF chunks into ChromaDB
pdf_handler.py Extracts and chunks text from uploaded PDFs
developer_mode.py Adds optional vectorstore inspector
config.py Holds model metadata and keys from .env

🎛️ Smarter UI Behavior with disabled Components

Previous Version

We used conditions like:

if not model_provider:
    return
Enter fullscreen mode Exit fullscreen mode

Which meant entire sections of the UI wouldn’t render at all until something was selected.

Example

RAG PDFBot V1

Current Version

In this version, all components are always rendered, but disabled until their prerequisites are met.

Why this matters:

  • The UI feels more responsive and intuitive
  • Users can "see" what steps are required
  • No jumping around or missing UI elements

Example:

RAG PDFBot V2

  • The model select dropdown is active only after choosing a provider
  • The pdf uploader is active only after choosing a model
  • The chat input is shown but disabled until PDFs are submitted

This approach improves clarity, especially for new users.


🚀 Want to Try It?

You can find the full source code here 👉 Zlash65/rag-bot-chroma

git clone https://github.com/Zlash65/rag-bot-chroma.git
cd rag-bot-chroma

python3 -m venv venv
source venv/bin/activate

pip3 install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

Create a .env file for your API keys:

GROQ_API_KEY=your-groq-key
GOOGLE_API_KEY=your-google-key
Enter fullscreen mode Exit fullscreen mode

Then launch the app:

streamlit run app.py
Enter fullscreen mode Exit fullscreen mode

💭 Final Thoughts

This version of RAG PDFBot isn’t just a refactor - it’s a learning step toward building production-grade RAG apps. With ChromaDB, internal tools, modular code, and more intuitive UI, it's easier to maintain and extend.

Still learning?

👉 Start here: Building a RAG-powered PDF Chatbot

Then come back and modularize like a pro.

Happy building! 🛠️

Comments 0 total

    Add comment