Refactoring RAG PDFBot: Modular Design with LangChain, Streamlit and ChromaDB

🔗 If you're new to this project, start with the original guide here: Building a RAG-powered PDF Chatbot

In real-world production systems, it’s common practice to split responsibilities into multiple well-defined modules. Instead of cramming everything into a single file, code is grouped based on functionality - making it easier to debug, scale, and maintain. In this version of the RAG PDFBot, we’re simulating that same structure.

In this post we will walk through how we can evolve our original chatbot into a modular, production-style app using LangChain, ChromaDB, and Streamlit.

👆 Here's a quick look at what you'll be building in this guide.

📦 Source Code: Zlash65/rag-bot-chroma

🧱 What's New in the Modular Version

Area	Original Version	Modular Edition
File Structure	One big `app.py` file	Multiple logical modules (chat, sidebar, LLM, PDF, vectorstore, config)
PDF Parser	PyPDF2	Switched to `pypdf`
Embedding Store	FAISS	Switched to `ChromaDB` (for learning & experimentation)
LLM Chains	Simple `load_qa_chain`	LangChain `RetrievalChain` with structured prompts
Prompting	Static prompt template	Modular prompt with system/human roles
Dev Tools	None	Built-in vectorstore inspector

🔁 From FAISS to ChromaDB

Both FAISS and ChromaDB are popular options for storing and searching vector embeddings.

⚠️ In this version, we're switching to ChromaDB - not because FAISS isn't good, but to experiment with a different vector database and learn its tradeoffs.

Feature	FAISS	ChromaDB
Persistence	In-memory by default (requires manual saving/loading with `.save_local()` / `.load_local()`)	Persistent by default (creates `chroma` directory and auto-saves)
Setup Complexity	Simple for in-memory; more manual steps for persistence	Plug-and-play with auto-persistence
Metadata Support	Stores metadata, but querying/filtering support is limited	Rich metadata filtering and querying support
Built-in Filtering	Minimal (not intuitive for metadata-based filtering)	Native filtering with conditions on metadata
Performance	Highly optimized for similarity search at scale (especially with GPU)	Good performance, but not optimized for billion-scale datasets
Indexing Options	Multiple indexing algorithms (Flat, IVF, HNSW, etc.)	Abstracted away - we don't control indexing

Use FAISS if:

You want high performance similarity search.
You’re comfortable managing manual persistence.
You’re deploying on-device or at scale, especially with GPU acceleration.

Use ChromaDB if:

You want auto-persistence with minimal setup.
You need metadata filtering (e.g., retrieve only documents from a specific source).
You're in rapid prototyping mode and want a simple dev experience.

Code Snippet: ChromaDB Setup

from langchain.vectorstores import Chroma

def create_chroma_vectorstore(chunks, embedding):
    vectorstore = Chroma.from_texts(
        texts=chunks,
        embedding=embedding,
        persist_directory="./data/chroma_store"
    )
    return vectorstore

🔍 VectorStore Inspector

One of the highlights of this version is we added a vectorstore inspector.

In the previous version, the vectorstore was a black box. Now, we can:

Run ad-hoc test queries
See matching chunks returned from Chroma
Visually debug which documents were used for answering

Code Snippet: Vector Inspector

def inspect_vectorstore(vs):
    st.subheader("🔬 Vectorstore Inspector")
    query = st.text_input("Enter a test query")
    if query:
        results = vs.similarity_search(query)
        for i, doc in enumerate(results):
            st.markdown(f"**Result {i+1}**")
            st.code(doc.page_content.strip())

Example:

🧠 Improved Prompt & Chain Logic

In the original version, we used:

load_qa_chain(llm, chain_type="stuff", prompt=...)

That worked - but now we’re using LangChain's RetrievalQA with a cleaner, modular prompt built using ChatPromptTemplate.

Code Snippet: New Chain Logic

from langchain.chains import RetrievalQA
from langchain.prompts import ChatPromptTemplate

def get_qa_chain(llm, retriever):
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a helpful assistant. Use the provided context to answer."),
        ("human", "{question}")
    ])
    return RetrievalQA.from_chain_type(
        llm=llm,
        retriever=retriever,
        chain_type="stuff",
        chain_type_kwargs={"prompt": prompt}
    )

Why it’s better:

We separate system and user roles clearly
It's easier to extend for follow-up questions or history
It aligns better with modern LLM chat paradigms

🧩 UI & Handler Logic: Cleaner, Separated, Smarter

The user interface behavior is mostly the same - but under the hood, it's been broken into logical handlers:

File	Role
`sidebar_handler.py`	Handles model selection, API key input, PDF upload, and utility buttons
`chat_handler.py`	Handles rendering chat bubbles, input box, and chat history download
`llm_handler.py`	Manages chain and prompt setup for different model providers
`vectorstore_handler.py`	Embeds and stores PDF chunks into ChromaDB
`pdf_handler.py`	Extracts and chunks text from uploaded PDFs
`developer_mode.py`	Adds optional vectorstore inspector
`config.py`	Holds model metadata and keys from `.env`

🎛️ Smarter UI Behavior with `disabled` Components

Previous Version

We used conditions like:

if not model_provider:
    return

Which meant entire sections of the UI wouldn’t render at all until something was selected.

Example

Current Version

In this version, all components are always rendered, but disabled until their prerequisites are met.

Why this matters:

The UI feels more responsive and intuitive
Users can "see" what steps are required
No jumping around or missing UI elements

Example:

The model select dropdown is active only after choosing a provider
The pdf uploader is active only after choosing a model
The chat input is shown but disabled until PDFs are submitted

This approach improves clarity, especially for new users.

🚀 Want to Try It?

You can find the full source code here 👉 Zlash65/rag-bot-chroma

git clone https://github.com/Zlash65/rag-bot-chroma.git
cd rag-bot-chroma

python3 -m venv venv
source venv/bin/activate

pip3 install -r requirements.txt

Create a .env file for your API keys:

GROQ_API_KEY=your-groq-key
GOOGLE_API_KEY=your-google-key

Then launch the app:

streamlit run app.py

💭 Final Thoughts

This version of RAG PDFBot isn’t just a refactor - it’s a learning step toward building production-grade RAG apps. With ChromaDB, internal tools, modular code, and more intuitive UI, it's easier to maintain and extend.

Still learning?

👉 Start here: Building a RAG-powered PDF Chatbot

Then come back and modularize like a pro.

Happy building! 🛠️

Zarrar Shaikh @zlash65