This is part of my journey building the Kai ecosystem—a fully local, offline-first, emotionally-intelligent AI assistant.
This week, I’m sharing what actually happened as I tried (and failed, and retried) to build voice command understanding for Kai Lite, my mobile-first companion app.
🤖 My Two AI Collaborators
ChatGPT: Idea generator and architecture partner. I used it for feature planning, prompt design, and exploring approaches.
Claude: My “implementation sidekick.” Every time I got stuck on code, Claude helped debug, re-architect, and refactor.
Key moment:
Claude told me, “LLMs are nice, but too slow for instant mobile use. You’ll wait 2–3 seconds per command.”
That advice changed my approach—speed (and flow) beat size.
🧑💻 Attempt #1: Pattern Rules (Fast, but… Messy)
I started with classic rule-based parsing:
- Regexes for matching intent (add event, check calendar, etc.)
- Lots of if/else spaghetti in my voice_command_parser.dart
The result?
It only worked for exact commands
I kept adding more and more patterns, forgetting what I’d written before (lol)
Session log snippet with Claude:
"remind me to go fishing tomorrow at 3pm"
→ Matched: go to (.*) ← WRONG!
→ Intent: navigation
→ Result: Error/confusion
🚀 Attempt #2: Tiny, On-Device NLU (What Actually Works!)
With Claude’s push, I rebuilt the whole flow:
Architecture Overview
[User Speaks]
↓
[Whisper-tiny → Text]
↓
[Intent Classifier: calendar_add, calendar_view, etc.]
↓
[Entity Extractor: date, time, title]
↓
[SmartVoiceParser → Structured Command]
↓
[Local Calendar API → Event Created]
This architecture ensures:
✅ Speed: No network latency, instant feedback.
✅ Privacy: No audio or text leaves the device.
✅ Reliability: Not dependent on internet or third-party APIs.
✅ Simplicity: Small models focused on specific tasks.
How It Works (Step-by-Step)
- Voice input via Whisper-tiny:
final text = await WhisperTinyEN.transcribe(audio);
- Intent classification:
final intent = await IntentClassifier.classify(text);
Example output:
{ "intent": "calendar_add", "confidence": 0.87 }
- Entity extraction:
final entities = await EntityExtractor.extract(text);
Example output:
{ "title": "go fishing", "date": "tomorrow", "time": "3:00 PM" }
- Smart voice command assembly:
final command = SmartVoiceParser.parse(text);
Returns one object, e.g.:
{
"intent": "calendar_add",
"slots": {
"title": "go fishing",
"date": "tomorrow",
"time": "3:00 PM"
}
}
- Calendar event created, instantly and offline.
Before vs After: Real Example
Pattern system failure:
"remind me to go fishing tomorrow at 3pm"
→ navigation intent (wrong)
→ confusion or error
Smart NLU success:
"remind me to go fishing tomorrow at 3pm"
→ calendar_add (confidence: 0.8+)
→ Title: "go fishing"
→ Time: "3:00 PM"
→ Date: "tomorrow"
→ Event created 🎉
Processing time: ~250–300 ms (on-device, fully offline)
🛠️ Technical Highlights
- Whisper-tiny for fast, offline voice-to-text (39 MB)
- BERT-tiny + intent head (~21 MB) for intent classification
- Dateparser-light (~1 MB) for fuzzy dates (“next Friday”, “this weekend”)
- All runs in <60 MB and feels instant on a modern phone
- Fully local: no cloud, zero data leaves device
🪲 Bugs & Iterations
- Early “smart” versions failed almost as much as my old rules
- Six rounds of real-world testing and log reviews to get to “it just works”
- “What’s my calendar like?” sometimes still triggers as an event… and honestly, I kind of love the bug now
🔑 Lessons:
- Don’t over-engineer: Tiny, purpose-built NLU is better than a “mini-LLM” for command/slot tasks
- Speed is UX: Even 2 seconds of lag kills the magic
- Privacy: Everything is processed right on-device—no API, no server, no cloud
- ChatGPT and Claude are amazing for rapid iteration and brainstorming—even for solo devs
💬 Wrap-up
Building Kai Lite this way taught me that “small” can be smarter, and gentle, local AI is possible for real daily use.