Why I Switched from LLMs to Tiny, Instant Voice NLU for Kai Lite

This is part of my journey building the Kai ecosystem—a fully local, offline-first, emotionally-intelligent AI assistant.
This week, I’m sharing what actually happened as I tried (and failed, and retried) to build voice command understanding for Kai Lite, my mobile-first companion app.

🤖 My Two AI Collaborators

ChatGPT: Idea generator and architecture partner. I used it for feature planning, prompt design, and exploring approaches.

Claude: My “implementation sidekick.” Every time I got stuck on code, Claude helped debug, re-architect, and refactor.

Key moment:

Claude told me, “LLMs are nice, but too slow for instant mobile use. You’ll wait 2–3 seconds per command.”

That advice changed my approach—speed (and flow) beat size.

🧑‍💻 Attempt #1: Pattern Rules (Fast, but… Messy)

I started with classic rule-based parsing:

Regexes for matching intent (add event, check calendar, etc.)
Lots of if/else spaghetti in my voice_command_parser.dart

The result?

It only worked for exact commands

I kept adding more and more patterns, forgetting what I’d written before (lol)

Session log snippet with Claude:

"remind me to go fishing tomorrow at 3pm"
→ Matched: go to (.*)  ← WRONG!
→ Intent: navigation 
→ Result: Error/confusion

🚀 Attempt #2: Tiny, On-Device NLU (What Actually Works!)

With Claude’s push, I rebuilt the whole flow:
Architecture Overview

[User Speaks]
     ↓
[Whisper-tiny → Text] 
     ↓
[Intent Classifier: calendar_add, calendar_view, etc.] 
     ↓
[Entity Extractor: date, time, title] 
     ↓
[SmartVoiceParser → Structured Command] 
     ↓
[Local Calendar API → Event Created]

This architecture ensures:

✅ Speed: No network latency, instant feedback.
✅ Privacy: No audio or text leaves the device.
✅ Reliability: Not dependent on internet or third-party APIs.
✅ Simplicity: Small models focused on specific tasks.

How It Works (Step-by-Step)

Voice input via Whisper-tiny:

final text = await WhisperTinyEN.transcribe(audio);

Intent classification:

final intent = await IntentClassifier.classify(text);

Example output:

{ "intent": "calendar_add", "confidence": 0.87 }

Entity extraction:

final entities = await EntityExtractor.extract(text);

Example output:

{ "title": "go fishing", "date": "tomorrow", "time": "3:00 PM" }

Smart voice command assembly:

final command = SmartVoiceParser.parse(text);

Returns one object, e.g.:

{
  "intent": "calendar_add",
  "slots": {
    "title": "go fishing",
    "date": "tomorrow",
    "time": "3:00 PM"
  }
}

Calendar event created, instantly and offline.

Before vs After: Real Example

Pattern system failure:

"remind me to go fishing tomorrow at 3pm"
→ navigation intent (wrong)
→ confusion or error

Smart NLU success:

"remind me to go fishing tomorrow at 3pm"
→ calendar_add (confidence: 0.8+)
→ Title: "go fishing"
→ Time: "3:00 PM"
→ Date: "tomorrow"
→ Event created 🎉

Processing time: ~250–300 ms (on-device, fully offline)

🛠️ Technical Highlights

Whisper-tiny for fast, offline voice-to-text (39 MB)
BERT-tiny + intent head (~21 MB) for intent classification
Dateparser-light (~1 MB) for fuzzy dates (“next Friday”, “this weekend”)
All runs in <60 MB and feels instant on a modern phone
Fully local: no cloud, zero data leaves device

🪲 Bugs & Iterations

Early “smart” versions failed almost as much as my old rules
Six rounds of real-world testing and log reviews to get to “it just works”
“What’s my calendar like?” sometimes still triggers as an event… and honestly, I kind of love the bug now

🔑 Lessons:

Don’t over-engineer: Tiny, purpose-built NLU is better than a “mini-LLM” for command/slot tasks
Speed is UX: Even 2 seconds of lag kills the magic
Privacy: Everything is processed right on-device—no API, no server, no cloud
ChatGPT and Claude are amazing for rapid iteration and brainstorming—even for solo devs

💬 Wrap-up

Building Kai Lite this way taught me that “small” can be smarter, and gentle, local AI is possible for real daily use.

Emily Lin @emily_lin_usa