How to use LLM in Browser using WebLLM
Gopi Krishna Suvanam

Gopi Krishna Suvanam @gopisuvanam

About: Gopi Krishna Suvanam is an entrepreneur and author of a book on economics. He has co-founded G-Square Solutions, offering AI and analytics services. He is passionate about decentralization and AI.

Joined:
Aug 23, 2023

How to use LLM in Browser using WebLLM

Publish Date: Feb 19
1 0

The rise of large language models (LLMs) like GPT-4 and Llama has transformed the AI landscape, but most of these models run on powerful cloud servers. What if you could run an LLM directly in your browser without relying on external APIs? This is where WebLLM comes in.

What is WebLLM?

WebLLM is an open-source project that enables running large language models entirely in the browser using WebGPU. This means you can execute LLMs like Llama 3, Mistral, and Gemma locally on your machine without requiring API calls to external servers.
Jump to notebook

Why Use WebLLM?

🔒 Privacy

Since WebLLM runs on your device, no data is sent to external servers, making it ideal for privacy-conscious applications.

⚡ Low Latency

Because there’s no network request to an API, WebLLM provides near-instant responses compared to cloud-based models.

🌍 Offline Capability

WebLLM allows running AI-powered apps without an internet connection once the model is downloaded.

💰 Cost Savings

Since there’s no need for expensive API calls (like OpenAI or Hugging Face’s hosted models), WebLLM can significantly reduce costs for AI applications.

How Does WebLLM Work?

WebLLM leverages WebGPU, the next-gen browser graphics API, to run models efficiently on your GPU. It builds on MLC LLM, which compiles and optimizes models to run in the browser.

Supported Models

WebLLM currently supports:
✅ Llama 3 (Meta AI)
✅ Mistral (Open-weight LLM)
✅ Gemma (Google’s lightweight LLM)
✅ StableLM (Stability AI)

Getting Started with WebLLM

Jump to notebook

1️⃣ Add WebLLM to Your JavaScript Project

You can integrate WebLLM via a CDN or npm package:

<script type="module">
  import { init, chat } from 'https://cdn.jsdelivr.net/npm/webllm@latest';

  async function main() {
    const model = await init('Llama-3-8B');
    const response = await chat(model, 'What is WebLLM?');
    console.log(response);
  }

  main();
</script>
Enter fullscreen mode Exit fullscreen mode

2️⃣ Running WebLLM in Scribbler (JavaScript Notebook)

If you prefer notebooks (like Jupyter but for JavaScript), you can try this in Scribbler:

window.default = await import('https://cdn.jsdelivr.net/npm/webllm@latest');
Enter fullscreen mode Exit fullscreen mode
const model = await init('Mistral-7B');
const response = await chat(model, 'Explain quantum computing');
response;
Enter fullscreen mode Exit fullscreen mode

3️⃣ Deploying a Chatbot with WebLLM

Want to build a chatbot with WebLLM? Here’s a minimal setup:

<input id="prompt" placeholder="Ask me anything...">
<button onclick="runChat()">Send</button>
<p id="output"></p>

<script type="module">
  import { init, chat } from 'https://cdn.jsdelivr.net/npm/webllm@latest';
  let model;

  async function setup() {
    model = await init('Gemma-2B');
  }

  async function runChat() {
    const input = document.getElementById('prompt').value;
    const response = await chat(model, input);
    document.getElementById('output').innerText = response;
  }

  setup();
</script>
Enter fullscreen mode Exit fullscreen mode

Performance Considerations

WebLLM requires a modern GPU and browser to run efficiently. It works best on:

  1. Google Chrome (latest)
  2. Edge (WebGPU enabled)
  3. Firefox Nightly (WebGPU experimental)

For best performance, enable WebGPU in Chrome by visiting:

chrome://flags/#enable-webgpu

Future of WebLLM

As WebGPU adoption grows, WebLLM could power offline AI assistants, interactive AI websites, and even AI-enhanced games. Future enhancements may include multi-modal AI (text + images) and custom fine-tuned models for specific applications.

Final Thoughts

WebLLM is a game-changer for AI development, allowing LLMs to run entirely in the browser without cloud dependency. Whether you're building chatbots, AI-enhanced web apps, or offline AI tools, WebLLM makes it possible.

🚀 Ready to try WebLLM? Drop a comment if you have questions or want more tutorials! 🙌

Comments 0 total

    Add comment