Building an AI Chatbot with LangChain, FastAPI & Streamlit

In this comprehensive guide, we’ll build a robust and modular AI chatbot from scratch using powerful, free-to-use large language models (LLMs) like Groq and Gemini. We’ll integrate optional web search functionality to enhance the chatbot’s responses.

Throughout this project, we’ll learn:

Dynamically select AI models
Create responsive AI agents
Develop structured backend services
Design an intuitive frontend user interface

By the end of this guide, we’ll have a solid understanding of how to integrate AI models into practical applications and how to maintain code modularity, enabling easy extension and future-proofing of our projects.

Key Technologies Explained

LLM (Large Language Model): Advanced AI models capable of understanding context and generating human-like text. Read more
Groq: Provides extremely fast inference using open‑source models like LLaMA. It’s free to use and ideal for rapid prototyping. Learn more
Gemini: Google’s advanced LLM with strong performance in reasoning and dialogue tasks. Explore Gemini
LangChain: Simplifies the use of LLMs by providing tools for prompts, memory management, chaining models with various tools and APIs. LangChain documentation
LangGraph: A LangChain extension to structure and manage stateful AI agents, allowing complex decision‑making processes. LangGraph Overview
FastAPI: A modern, high‑performance Python framework for quickly building robust APIs. It’s intuitive, fast, and easy to learn. FastAPI docs
Streamlit: Enables us to build interactive and attractive web applications quickly and effortlessly using Python. Streamlit official site

Why Groq and Gemini and not OpenAI?

We chose Groq and Gemini instead of OpenAI primarily because OpenAI models are not available for free. Groq and Gemini, on the other hand, provide free access to advanced, capable language models like LLaMA and Google’s own Gemini models, making them perfect for experimenting, learning, and prototyping without worrying about burning through our pocket money and living like a hermit for the rest of the month.

Project Overview

System Design

Code Structure

You can find the source code here - Zlash65/agentic-ai-chatbot-example

agentic-ai-chatbot-example/
├── .env
├── requirements.txt
├── main.py                     # FastAPI entry point
├── agents/                     # Phase 1 - AI logic
│   ├── llm_provider.py         # Select LLM provider
│   ├── tools.py                # Tavily search tool
│   └── ai_agent.py             # Build LangGraph agent
├── backend/                    # Phase 2 - Backend API
│   ├── config.py               # Load env vars
│   ├── schema.py               # Pydantic schema
│   ├── router.py               # /chat endpoint
├── frontend/                   # Phase 3 - Streamlit UI
│   └── streamlit_app.py        # Streamlit chat UI

Phase 1 - AI Agent Configuration

`llm_provider.py`

This code snippet dynamically chooses the appropriate AI model provider (Groq or Gemini). We structured this code to be modular, allowing easy future addition of more LLM providers:

from langchain_groq import ChatGroq
from langchain_google_genai import ChatGoogleGenerativeAI
from backend.config import GROQ_API_KEY, GOOGLE_API_KEY

def get_llm(model_provider: str, model_name: str):
    if model_provider == "groq":
        if not GROQ_API_KEY:
            raise ValueError("GROQ_API_KEY is not set")
        return ChatGroq(model=model_name, api_key=GROQ_API_KEY)

    elif model_provider == "gemini":
        if not GOOGLE_API_KEY:
            raise ValueError("GOOGLE_API_KEY is not set")
        return ChatGoogleGenerativeAI(model=model_name, api_key=GOOGLE_API_KEY)
    else:
        raise ValueError(f"Invalid model provider: {model_provider}")

`tools.py`

We conditionally include a web search tool. This gives users the flexibility to enable or disable web searching as needed:

from langchain_community.tools.tavily_search import TavilySearchResults

def get_tools(allow_search: bool):
    tools = []
    if allow_search:
        tools.append(TavilySearchResults())
    return tools

`ai_agent.py`

This function is the core engine of our AI chatbot. It actually calls the LLM (Groq or Gemini), optionally enables tools like web search, and returns the AI-generated message:

from langgraph.prebuilt import create_react_agent
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage
from agents.llm_provider import get_llm
from agents.tools import get_tools

def get_response_from_agent(
    model_provider: str,
    model_name: str,
    allow_search: bool,
    user_messages: list[str],
    system_prompt: str
) -> str:
    llm = get_llm(model_provider, model_name)
    tools = get_tools(allow_search)

    agent = create_react_agent(model=llm, tools=tools)

    state = {
        "messages": [
            SystemMessage(content=system_prompt),
            HumanMessage(content=user_messages[-1])
        ]
    }

    response = agent.invoke(state)
    messages = response.get("messages", [])

    ai_messages = [msg.content for msg in messages if isinstance(msg, AIMessage)]
    return ai_messages[-1] if ai_messages else "No response from AI"

Why only the last user message?

We’re simulating a stateless, simple interaction to keep responses fast and reduce LLM token usage.
Passing all previous messages would allow the LLM to remember context and have a longer conversation memory.
We can extend this later by feeding in more HumanMessage and AIMessage history for memory support.

Phase 2 - Backend Setup (FastAPI)

`config.py`

Securely load environment variables from the .env file:

from dotenv import load_dotenv
import os

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
GROQ_API_KEY = os.getenv("GROQ_API_KEY")
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
TAVILY_API_KEY = os.getenv("TAVILY_API_KEY")

`schema.py`

Define structured input from the frontend:

from pydantic import BaseModel
from typing import List

class ChatRequest(BaseModel):
    model_name: str
    model_provider: str
    system_prompt: str
    messages: List[str]
    allow_search: bool

`router.py`

from fastapi import APIRouter
from backend.schema import ChatRequest
from agents.ai_agent import get_response_from_agent

router = APIRouter()

ALLOWED_PROVIDERS = ["groq", "gemini"]
ALLOWED_MODELS = [
    "llama-3.1-8b-instant",
    "llama3-70b-8192",
    "gemini-2.0-flash",
    "gemini-2.5-flash"
]

@router.post("/chat")
def chat(request: ChatRequest):
    model_provider = request.model_provider.lower()
    model_name = request.model_name
    allow_search = request.allow_search

    if model_provider not in ALLOWED_PROVIDERS:
        raise ValueError(
            f"Invalid model provider: {request.model_provider}. "
            f"Must be one of {ALLOWED_PROVIDERS}."
        )
    if model_name not in ALLOWED_MODELS:
        raise ValueError(
            f"Invalid model name: {model_name}. "
            f"Must be one of {ALLOWED_MODELS}."
        )

    response = get_response_from_agent(
        model_provider=model_provider,
        model_name=model_name,
        allow_search=allow_search,
        user_messages=request.messages,
        system_prompt=request.system_prompt
    )

    return {"response": response}

ALLOWED_PROVIDERS & ALLOWED_MODELS - safety checks to block unsupported data.
We define a POST endpoint at /chat; FastAPI auto‑validates the body against ChatRequest.
The validated data flows into get_response_from_agent(), which handles model init, tools, and response building.

Phase 3 - Frontend Setup (Streamlit)

`streamlit_app.py`

Find the full code here.

Dynamic Model Selection

To prevent invalid provider‑model combinations, we dynamically update model selections based on the chosen provider:

MODEL_OPTIONS = {
    "Groq": ["llama-3.1-8b-instant", "llama3-70b-8192"],
    "Gemini": ["gemini-2.0-flash", "gemini-2.5-flash"]
}

model_provider = st.selectbox("🔌 Model Provider", list(MODEL_OPTIONS.keys()))
model_choices = MODEL_OPTIONS[model_provider]

if "model_name" in st.session_state and st.session_state.model_name not in model_choices:
    st.session_state.model_name = model_choices[0]

model_name = st.selectbox("🧠 Model Name", model_choices)

Running Our Application

# Terminal tab 1 - backend
source .venv/bin/activate
uvicorn main:app --reload

# Terminal tab 2 - frontend
source .venv/bin/activate
streamlit run frontend/streamlit_app.py

Final Chatbot in Action

🧠 Wrapping Up

By following this detailed guide, we’ve learned how to:

Dynamically integrate multiple powerful AI providers
Construct structured backend services using FastAPI
Create user‑friendly interfaces with Streamlit
Keep our code modular and easy to extend

🔧 What We Can Explore Next

Now that we have a solid foundation, here are a few ideas to level up:

Add context memory - retain previous questions and answers for richer conversations.
Enable streaming responses - stream long responses token by token for a smoother user experience.
Plug in more tools - let the AI do math, call APIs, or browse docs via LangChain tools.
Deploy our app - share your chatbot with the world on Render, Railway, or Hugging Face Spaces.

🚀 Final Thoughts

This project isn’t just a chatbot - it’s a starter kit for any LLM‑powered product. Whether you’re prototyping an assistant, automating research, or experimenting with multi‑agent workflows, this architecture gives you room to grow.

So tweak it, break it, improve it - and most importantly, have fun while learning.

Happy building! 💻✨

Zarrar Shaikh @zlash65