About: I help startups and companies build scalable software solutions as a Software Engineer | Full-Stack Web & Mobile Developer | DevOps & Cloud Engineer | System Engineering & Automation
Location:
Lagos, Nigeria
Joined:
Feb 26, 2023
Supportly – Real-Time Voice & Video Agent for Customer Support
Supportly is a plug-and-play real-time voice & video support module that developers can integrate into any web application. It falls under the following challenge categories:
Business Automation – The voice agent records interactions between support agents and customers, saving them to a database. After each session, it generates a summary of the conversation, which is automatically sent to the customer emails address.
Real-Time Performance – provides live transcription during support calls.
The project empowers support teams to offer on-demand human assistance while using AssemblyAI’s streaming to:
Supportly - Video Support Call Scheduling Platform
A modern video call customer support application built with React Router v7, TypeScript, and Tailwind CSS. This platform allows customers to easily schedule video calls with support teams to resolve issues and get product assistance.
🚀 Features
Customer Features
Easy Session Booking: Schedule video support sessions with a simple form
Real-time Video Calls: High-quality video calls with screen sharing capabilities
Session Management: View upcoming and completed sessions
Profile Management: Update personal information and preferences
Session History: Track all past sessions with ratings and feedback
Admin/Support Team Features
Admin Dashboard: Comprehensive overview of all support sessions
Team Management: Manage support team members and their availability
Schedule Management: Set available time slots and manage bookings
Session Analytics: Track performance metrics and customer satisfaction
Technical Features
🎥 Video Call Integration: Browser-based video calls (no additional software…
The Supportly application uses AssemblyAI's streaming transcription service to provide real-time speech-to-text functionality during video support sessions. The integration involves:
Audio Processing: Capturing audio from user's microphone using Web Audio API
Real-time Streaming: Sending audio chunks to AssemblyAI via WebSocket
Live Transcription: Receiving and displaying transcripts in real-time
Multi-user Support: Managing separate transcription sessions for each user
Architecture Components
1. AssemblyAI Configuration (config/assembyai.js)
The main configuration class that handles the AssemblyAI streaming connection:
classAssemblyAIConfig{constructor(){try{this.client=newAssemblyAI({apiKey:process.env.ASSEMBLYAI_API_KEY,});this.transcriber=null;this.isConnected=false;this.isConnecting=false;}catch (error){console.error(error);}}asyncrun(){try{// Prevent multiple concurrent connection attemptsif (this.isConnecting||this.isConnected){console.log('Connection already in progress or established...');return;}this.isConnecting=true;this.transcriber=this.client.streaming.transcriber({sampleRate:16_000,formatTurns:true});// Set up event handlersthis.transcriber.on("open",({id})=>{console.log(`Session opened with ID: ${id}`);this.isConnected=true;this.isConnecting=false;});this.transcriber.on("error",(error)=>{console.error("Transcriber error:",error);this.isConnected=false;this.isConnecting=false;});awaitthis.transcriber.connect();console.log("Starting streaming...");}catch (error){console.error('Error in run():',error);this.isConnected=false;this.isConnecting=false;}}transcribe(callBack){this.transcriber.on("turn",(turn)=>{if (!turn.transcript){return;}callBack(turn.transcript);});}}
2. WebSocket Manager (config/websocket.js)
Manages the connection between clients and handles AssemblyAI instances for each user:
classWebSocketManager{constructor(){this.io=null;this.userTranscribers=newMap();// Store AssemblyAI instance per user}asyncconnect(io){io.on("connection",async (socket)=>{// Create a new AssemblyAI instance for this userconstassemblyai=newAssemblyAIConfigClass();this.userTranscribers.set(socket.id,assemblyai);socket.on("start-transcription",async ()=>{console.log(`Starting transcription for ${socket.user.email}`);constassemblyai=this.userTranscribers.get(socket.id);if (assemblyai){// Check if already running to prevent duplicate startsif (assemblyai.isConnected||assemblyai.isConnecting){console.log('Transcription already running or starting...');return;}try{awaitassemblyai.run();assemblyai.transcribe((transcript)=>{console.log(`Transcription for ${socket.user.email}:`,transcript);// Emit transcription to all users in the sessionsocket.to(sessionId).emit("transcription",transcript);});console.log('Transcription started successfully');}catch (error){console.error('Error starting transcription:',error);}}});socket.on('audio-chunk',async (audioBlob)=>{constassemblyai=this.userTranscribers.get(socket.id);if (assemblyai){try{assemblyai.transcriber.sendAudio(Buffer.from(audioBlob));}catch (error){console.error('Error processing audio chunk:',error);}}});socket.on("disconnect",async ()=>{// Clean up transcription when user disconnectsconstassemblyai=this.userTranscribers.get(socket.id);if (assemblyai){awaitassemblyai.safeClose();this.userTranscribers.delete(socket.id);}});});}}
3. Audio Processing (public/audio-processor.js)
Web Audio API worklet for processing audio in real-time:
constMAX_16BIT_INT=32767classAudioProcessorextendsAudioWorkletProcessor{process(inputs){try{constinput=inputs[0]if (!input)thrownewError('No input')constchannelData=input[0]if (!channelData)thrownewError('No channelData')// Convert Float32 audio data to Int16 for AssemblyAIconstfloat32Array=Float32Array.from(channelData)constint16Array=Int16Array.from(float32Array.map((n)=>n*MAX_16BIT_INT))constbuffer=int16Array.buffer// Send processed audio to main threadthis.port.postMessage({audio_data:buffer})returntrue}catch (error){console.error(error)returnfalse}}}registerProcessor('audio-processor',AudioProcessor)
The React component that handles the UI and audio processing:
exportdefaultfunctionVideoCall(){constaudioWorkletNodeRef=useRef<AudioWorkletNode|null>(null);constaudioBufferQueueRef=useRef<Int16Array>(newInt16Array(0));const[transcripts,setTranscripts]=useState<Array<{id:number;text:string;timestamp:Date;speaker:string;}>>([]);const[currentTranscript,setCurrentTranscript]=useState("");// Setup audio processor for real-time transcriptionconstsetupAudioProcessor=async ()=>{try{if (!localStreamRef.current)return;// Create audio context with 16kHz sample rate (required by AssemblyAI)audioContextRef.current=newAudioContext({sampleRate:16000,latencyHint:"balanced",});// Load audio processor workletawaitaudioContextRef.current.audioWorklet.addModule("/audio-processor.js");// Create audio worklet nodeaudioWorkletNodeRef.current=newAudioWorkletNode(audioContextRef.current,"audio-processor");// Handle processed audio dataaudioWorkletNodeRef.current.port.onmessage=(event)=>{const{audio_data}=event.data;// Merge with previous bufferconstnewBuffer=newInt16Array(audio_data);audioBufferQueueRef.current=mergeBuffers(audioBufferQueueRef.current,newBuffer);// Send audio chunks when buffer reaches sufficient sizeconstCHUNK_SIZE=1600;// 100ms at 16kHzwhile (audioBufferQueueRef.current.length>=CHUNK_SIZE){constchunk=audioBufferQueueRef.current.slice(0,CHUNK_SIZE);audioBufferQueueRef.current=audioBufferQueueRef.current.slice(CHUNK_SIZE);// Send to server via WebSocketsocketRef.current?.emit('audio-chunk',chunk.buffer);}};// Connect audio source to processorconstsource=audioContextRef.current.createMediaStreamSource(localStreamRef.current);source.connect(audioWorkletNodeRef.current);audioWorkletNodeRef.current.connect(audioContextRef.current.destination);// Start transcriptionsocketRef.current?.emit("start-transcription");console.log("Audio processor setup completed");}catch (error){console.error("Error setting up audio processor:",error);}};// Handle incoming transcriptionsuseEffect(()=>{if (socketRef.current){socketRef.current.on("transcription",(transcript:string)=>{console.log("Received transcription:",transcript);// Update current live transcriptsetCurrentTranscript(transcript);// Add to transcript history if it's a complete sentenceif (transcript.trim().endsWith('.')||transcript.trim().endsWith('?')||transcript.trim().endsWith('!')){setTranscripts(prev=>[...prev,{id:Date.now(),text:transcript,timestamp:newDate(),speaker:"Speaker"// Could be enhanced to identify speakers}]);setCurrentTranscript("");// Clear current transcript}});}},[]);functionmergeBuffers(lhs:Int16Array,rhs:Int16Array){constmerged=newInt16Array(lhs.length+rhs.length);merged.set(lhs,0);merged.set(rhs,lhs.length);returnmerged;}}
Data Flow
Audio Capture: User's microphone audio is captured via getUserMedia()
Audio Processing: Raw audio is processed through Web Audio API worklet
Format Conversion: Float32 audio is converted to Int16 format at 16kHz sample rate
Chunking: Audio is buffered and sent in chunks via WebSocket
Server Processing: Node.js server receives audio chunks and forwards to AssemblyAI
Transcription: AssemblyAI processes audio and returns transcripts
Broadcasting: Transcripts are broadcast to all participants in the session
UI Update: Frontend displays live and completed transcripts
Key Features
Real-time Transcription
Live Updates: Transcripts appear as users speak
Turn-based: Uses AssemblyAI's formatTurns: true for better sentence structure
Low Latency: Optimized audio processing for minimal delay
Multi-user Support
Isolated Sessions: Each user gets their own AssemblyAI transcriber instance
Concurrent Processing: Multiple users can speak simultaneously
Session Management: Proper cleanup when users disconnect
Audio Optimization
16kHz Sample Rate: Optimized for speech recognition
👏👏👏