🔗 If you're new to this project, start with the original guide here: Building a RAG-powered PDF Chatbot
In real-world production systems, it’s common practice to split responsibilities into multiple well-defined modules. Instead of cramming everything into a single file, code is grouped based on functionality - making it easier to debug, scale, and maintain. In this version of the RAG PDFBot, we’re simulating that same structure.
In this post we will walk through how we can evolve our original chatbot into a modular, production-style app using LangChain, ChromaDB, and Streamlit.
👆 Here's a quick look at what you'll be building in this guide.
📦 Source Code: Zlash65/rag-bot-chroma
🧱 What's New in the Modular Version
Area | Original Version | Modular Edition |
---|---|---|
File Structure | One big app.py file |
Multiple logical modules (chat, sidebar, LLM, PDF, vectorstore, config) |
PDF Parser | PyPDF2 | Switched to pypdf
|
Embedding Store | FAISS | Switched to ChromaDB (for learning & experimentation) |
LLM Chains | Simple load_qa_chain
|
LangChain RetrievalChain with structured prompts |
Prompting | Static prompt template | Modular prompt with system/human roles |
Dev Tools | None | Built-in vectorstore inspector |
🔁 From FAISS to ChromaDB
Both FAISS and ChromaDB are popular options for storing and searching vector embeddings.
⚠️ In this version, we're switching to ChromaDB - not because FAISS isn't good, but to experiment with a different vector database and learn its tradeoffs.
Feature | FAISS | ChromaDB |
---|---|---|
Persistence | In-memory by default (requires manual saving/loading with .save_local() / .load_local() ) |
Persistent by default (creates chroma directory and auto-saves) |
Setup Complexity | Simple for in-memory; more manual steps for persistence | Plug-and-play with auto-persistence |
Metadata Support | Stores metadata, but querying/filtering support is limited | Rich metadata filtering and querying support |
Built-in Filtering | Minimal (not intuitive for metadata-based filtering) | Native filtering with conditions on metadata |
Performance | Highly optimized for similarity search at scale (especially with GPU) | Good performance, but not optimized for billion-scale datasets |
Indexing Options | Multiple indexing algorithms (Flat, IVF, HNSW, etc.) | Abstracted away - we don't control indexing |
Use FAISS if:
- You want high performance similarity search.
- You’re comfortable managing manual persistence.
- You’re deploying on-device or at scale, especially with GPU acceleration.
Use ChromaDB if:
- You want auto-persistence with minimal setup.
- You need metadata filtering (e.g., retrieve only documents from a specific source).
- You're in rapid prototyping mode and want a simple dev experience.
Code Snippet: ChromaDB Setup
from langchain.vectorstores import Chroma
def create_chroma_vectorstore(chunks, embedding):
vectorstore = Chroma.from_texts(
texts=chunks,
embedding=embedding,
persist_directory="./data/chroma_store"
)
return vectorstore
🔍 VectorStore Inspector
One of the highlights of this version is we added a vectorstore inspector.
In the previous version, the vectorstore was a black box. Now, we can:
- Run ad-hoc test queries
- See matching chunks returned from Chroma
- Visually debug which documents were used for answering
Code Snippet: Vector Inspector
def inspect_vectorstore(vs):
st.subheader("🔬 Vectorstore Inspector")
query = st.text_input("Enter a test query")
if query:
results = vs.similarity_search(query)
for i, doc in enumerate(results):
st.markdown(f"**Result {i+1}**")
st.code(doc.page_content.strip())
Example:
🧠 Improved Prompt & Chain Logic
In the original version, we used:
load_qa_chain(llm, chain_type="stuff", prompt=...)
That worked - but now we’re using LangChain's RetrievalQA
with a cleaner, modular prompt built using ChatPromptTemplate
.
Code Snippet: New Chain Logic
from langchain.chains import RetrievalQA
from langchain.prompts import ChatPromptTemplate
def get_qa_chain(llm, retriever):
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant. Use the provided context to answer."),
("human", "{question}")
])
return RetrievalQA.from_chain_type(
llm=llm,
retriever=retriever,
chain_type="stuff",
chain_type_kwargs={"prompt": prompt}
)
Why it’s better:
- We separate system and user roles clearly
- It's easier to extend for follow-up questions or history
- It aligns better with modern LLM chat paradigms
🧩 UI & Handler Logic: Cleaner, Separated, Smarter
The user interface behavior is mostly the same - but under the hood, it's been broken into logical handlers:
File | Role |
---|---|
sidebar_handler.py |
Handles model selection, API key input, PDF upload, and utility buttons |
chat_handler.py |
Handles rendering chat bubbles, input box, and chat history download |
llm_handler.py |
Manages chain and prompt setup for different model providers |
vectorstore_handler.py |
Embeds and stores PDF chunks into ChromaDB |
pdf_handler.py |
Extracts and chunks text from uploaded PDFs |
developer_mode.py |
Adds optional vectorstore inspector |
config.py |
Holds model metadata and keys from .env
|
🎛️ Smarter UI Behavior with disabled
Components
Previous Version
We used conditions like:
if not model_provider:
return
Which meant entire sections of the UI wouldn’t render at all until something was selected.
Example
Current Version
In this version, all components are always rendered, but disabled until their prerequisites are met.
Why this matters:
- The UI feels more responsive and intuitive
- Users can "see" what steps are required
- No jumping around or missing UI elements
Example:
- The model select dropdown is active only after choosing a provider
- The pdf uploader is active only after choosing a model
- The chat input is shown but disabled until PDFs are submitted
This approach improves clarity, especially for new users.
🚀 Want to Try It?
You can find the full source code here 👉 Zlash65/rag-bot-chroma
git clone https://github.com/Zlash65/rag-bot-chroma.git
cd rag-bot-chroma
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
Create a .env
file for your API keys:
GROQ_API_KEY=your-groq-key
GOOGLE_API_KEY=your-google-key
Then launch the app:
streamlit run app.py
💭 Final Thoughts
This version of RAG PDFBot isn’t just a refactor - it’s a learning step toward building production-grade RAG apps. With ChromaDB, internal tools, modular code, and more intuitive UI, it's easier to maintain and extend.
Still learning?
👉 Start here: Building a RAG-powered PDF Chatbot
Then come back and modularize like a pro.
Happy building! 🛠️