Why I Switched from LLMs to Tiny, Instant Voice NLU for Kai Lite — Week 2
Emily Lin

Emily Lin @emily_lin_usa

About: AI-powered coder passionate about building offline-first tools and personal agents. Currently learning AI agent development hands-on, documenting breakthroughs and roadblocks.

Location:
San Francisco
Joined:
Aug 5, 2025

Why I Switched from LLMs to Tiny, Instant Voice NLU for Kai Lite — Week 2

Publish Date: Aug 18
0 0

This is part of my journey building the Kai ecosystem—a fully local, offline-first, emotionally-intelligent AI assistant.
This week, I’m sharing what actually happened as I tried (and failed, and retried) to build voice command understanding for Kai Lite, my mobile-first companion app.

🤖 My Two AI Collaborators

ChatGPT: Idea generator and architecture partner. I used it for feature planning, prompt design, and exploring approaches.

Claude: My “implementation sidekick.” Every time I got stuck on code, Claude helped debug, re-architect, and refactor.

Key moment:

Claude told me, “LLMs are nice, but too slow for instant mobile use. You’ll wait 2–3 seconds per command.”

That advice changed my approach—speed (and flow) beat size.

🧑‍💻 Attempt #1: Pattern Rules (Fast, but… Messy)

I started with classic rule-based parsing:

  • Regexes for matching intent (add event, check calendar, etc.)
  • Lots of if/else spaghetti in my voice_command_parser.dart

The result?

It only worked for exact commands

I kept adding more and more patterns, forgetting what I’d written before (lol)

Session log snippet with Claude:

"remind me to go fishing tomorrow at 3pm"
→ Matched: go to (.*)  ← WRONG!
→ Intent: navigation 
→ Result: Error/confusion
Enter fullscreen mode Exit fullscreen mode

🚀 Attempt #2: Tiny, On-Device NLU (What Actually Works!)

With Claude’s push, I rebuilt the whole flow:
Architecture Overview

[User Speaks]
     ↓
[Whisper-tiny → Text] 
     ↓
[Intent Classifier: calendar_add, calendar_view, etc.] 
     ↓
[Entity Extractor: date, time, title] 
     ↓
[SmartVoiceParser → Structured Command] 
     ↓
[Local Calendar API → Event Created]
Enter fullscreen mode Exit fullscreen mode

This architecture ensures:

✅ Speed: No network latency, instant feedback.
✅ Privacy: No audio or text leaves the device.
✅ Reliability: Not dependent on internet or third-party APIs.
✅ Simplicity: Small models focused on specific tasks.

How It Works (Step-by-Step)

  1. Voice input via Whisper-tiny:
final text = await WhisperTinyEN.transcribe(audio);
Enter fullscreen mode Exit fullscreen mode
  1. Intent classification:
final intent = await IntentClassifier.classify(text);
Enter fullscreen mode Exit fullscreen mode

Example output:

{ "intent": "calendar_add", "confidence": 0.87 }
Enter fullscreen mode Exit fullscreen mode
  1. Entity extraction:
final entities = await EntityExtractor.extract(text);
Enter fullscreen mode Exit fullscreen mode

Example output:

{ "title": "go fishing", "date": "tomorrow", "time": "3:00 PM" }
Enter fullscreen mode Exit fullscreen mode
  1. Smart voice command assembly:
final command = SmartVoiceParser.parse(text);
Enter fullscreen mode Exit fullscreen mode

Returns one object, e.g.:

{
  "intent": "calendar_add",
  "slots": {
    "title": "go fishing",
    "date": "tomorrow",
    "time": "3:00 PM"
  }
}
Enter fullscreen mode Exit fullscreen mode
  1. Calendar event created, instantly and offline.

Before vs After: Real Example

Pattern system failure:

"remind me to go fishing tomorrow at 3pm"
→ navigation intent (wrong)
→ confusion or error
Enter fullscreen mode Exit fullscreen mode

Smart NLU success:

"remind me to go fishing tomorrow at 3pm"
→ calendar_add (confidence: 0.8+)
→ Title: "go fishing"
→ Time: "3:00 PM"
→ Date: "tomorrow"
→ Event created 🎉
Enter fullscreen mode Exit fullscreen mode

Processing time: ~250–300 ms (on-device, fully offline)

🛠️ Technical Highlights

  • Whisper-tiny for fast, offline voice-to-text (39 MB)
  • BERT-tiny + intent head (~21 MB) for intent classification
  • Dateparser-light (~1 MB) for fuzzy dates (“next Friday”, “this weekend”)
  • All runs in <60 MB and feels instant on a modern phone
  • Fully local: no cloud, zero data leaves device

🪲 Bugs & Iterations

  • Early “smart” versions failed almost as much as my old rules
  • Six rounds of real-world testing and log reviews to get to “it just works”
  • “What’s my calendar like?” sometimes still triggers as an event… and honestly, I kind of love the bug now

🔑 Lessons:

  • Don’t over-engineer: Tiny, purpose-built NLU is better than a “mini-LLM” for command/slot tasks
  • Speed is UX: Even 2 seconds of lag kills the magic
  • Privacy: Everything is processed right on-device—no API, no server, no cloud
  • ChatGPT and Claude are amazing for rapid iteration and brainstorming—even for solo devs

💬 Wrap-up

Building Kai Lite this way taught me that “small” can be smarter, and gentle, local AI is possible for real daily use.

Comments 0 total

    Add comment