Meet TalkLLM — A Local AI Assistant in React
Mahmudur Rahman

Mahmudur Rahman @mahmud-r-farhan

About: Full Stack Developer

Location:
Bangladesh
Joined:
Sep 24, 2024

Meet TalkLLM — A Local AI Assistant in React

Publish Date: Apr 20
0 0

🧠 Meet TalkLLM — A Local AI Assistant in React

Over the weekend, I built TalkLLM — a minimal, privacy-focused AI chat app that runs directly in the browser using WebLLM.

It uses Meta’s Llama 3 model via MLC AI’s WebLLM engine — meaning you get:

  • ✅ Fast, local response generation

  • 🛡️ Full privacy (no API keys, no cloud)

  • ⚡ A smooth chat interface with history, loading state, and error handling


🛠️ What is WebLLM?

WebLLM lets you run large language models in-browser using WebGPU + WebAssembly — no servers or Python required.

It’s powered by MLC, a powerful compiler stack for machine learning models. WebLLM brings that to your frontend.

You just need a compatible browser, and you can run powerful models like Llama 2/3, Phi, or Mistral — completely client-side.


⚙️ Tech Stack

  • React (with hooks)

  • @mlc-ai/web-llm

  • Llama 3.1 8B Instruct

  • Sass for lightweight styling


📦 Project Setup

🔗 GitHub Repo: https://github.com/mahmud-r-farhan/TalkLLM

🖥️ Requirements

  • ✅ A WebGPU-compatible browser

  • ✅ WebAssembly support

  • ✅ Node.js v16+

  • ✅ npm or yarn

📂 Installation

git clone https://github.com/mahmud-r-farhan/TalkLLM
cd TalkLLM
npm install
npm run dev

Enter fullscreen mode Exit fullscreen mode

The app will run locally at http://localhost:5173.


🔍 Key Features in the Code

  • Local model initialization using CreateMLCEngine()

  • Loading indicators during model setup and generation

  • Clean useCallback()-wrapped sendMessageToLlm() function

  • Input validation (no empty prompts)

  • UI blocking during loading states

🧠 Chat Memory Example

const [messages, setMessages] = useState([
  { role: "system", content: "Hello, I am TalkLLM. How can I assist you today?" }
]);

Enter fullscreen mode Exit fullscreen mode

The model keeps context using a messages array, just like the OpenAI API format.


🧪 Bonus: Want to Use a Different Model?

WebLLM supports multiple models. To switch, update this line:

const selectedModel = "Llama-3.1-8B-Instruct-q4f32_1-MLC";

Enter fullscreen mode Exit fullscreen mode

Check out the model catalog for available variants.


🙋‍♂️ Why Build This?

I wanted a way to explore LLMs without vendor lock-in. And with WebGPU maturing, it felt like the perfect time to experiment with truly local AI.

This app is a proof-of-concept — and a great starting point if you’re building privacy-first AI tools or offline chat experiences.


📚 Resources


🙌 Final Thoughts

WebLLM is changing the game. Whether you're working on privacy-focused apps, internal tools, or want to tinker with LLMs — this opens up huge opportunities.


Follow for more!

Comments 0 total

    Add comment