Voice of Voiceless is a cutting-edge Streamlit application designed to bridge communication gaps for deaf and hard-of-hearing individuals through ultra-fast real-time speech transcription, emotional tone detection, and sentiment analysis. Built specifically for the AssemblyAI Voice Agents Challenge, this application demonstrates the transformative potential of sub-300ms voice processing in accessibility-critical scenarios.
The application serves as more than just a transcription tool—it's a comprehensive communication assistant that provides visual feedback about not just what is being said, but how it's being said, creating a richer understanding of conversations for users who cannot hear audio cues.
Challenge Category
This submission targets the Real-Time Voice Performance category, with a laser focus on:
Optimizing for accessibility-critical use cases where speed matters most
Demonstrating technical excellence in real-time audio processing
Creating innovative speed-dependent applications for communication accessibility
Key Features
The application delivers a comprehensive suite of accessibility-focused features:
Ultra-Fast Transcription: Sub-300ms latency using AssemblyAI's Universal-Streaming API
Multi-Speaker Support: Real-time speaker identification and visual distinction
Emotional Intelligence: Live tone detection (happy, sad, angry, calm, excited, neutral)
Sentiment Analysis: Real-time sentiment scoring with visual indicators
Accessibility-First Design: WCAG 2.1 AA compliant interface with high contrast modes
Performance Monitoring: Live latency tracking and system optimization
Visual Alert System: Flash notifications for important audio events
Adaptive Interface: Customizable text sizes, color schemes, and accessibility preferences
Demo
Live Application
The Voice of Voiceless application can be run locally using Streamlit. The interface provides an intuitive, accessibility-focused experience with real-time updates and comprehensive visual feedback systems.
Screenshots
Main Interface - Real-Time Transcription
The primary interface features a clean, high-contrast design with large, readable text and clear visual indicators for connection status and performance metrics.
Accessibility Controls Panel
The sidebar provides comprehensive accessibility controls including:
High contrast mode toggle
Scalable text size adjustment (12-28px)
Visual alert preferences
Audio quality settings
Performance monitoring options
Sentiment and Tone Analysis
Real-time emotional intelligence display with:
VoiceAccess is a cutting-edge Streamlit application designed to help deaf and hard-of-hearing individuals by providing ultra-fast real-time speech transcription, tone detection, and sentiment analysis. Built with AssemblyAI's Universal-Streaming API, it delivers sub-300ms latency for critical accessibility applications.
🎯 Challenge Category: Real-Time Voice Performance
This project focuses on creating the fastest, most responsive voice experience possible using AssemblyAI's Universal-Streaming technology, specifically designed for accessibility-critical use cases where sub-300ms latency matters most.
✨ K
🎭 Advanced Audio Intelligence
Tone Detection: Real-time emotional tone analysis (happy, sad, angry, calm, etc.)
Sentiment Analysis: Live sentiment scoring with visual indicators
Speaker Diarization: Automatic speaker identification and separation
Confidence Scoring: Reliability metrics for all audio intelligence features
♿ Accessibility-First Design
High Contrast Mode: Enhanced visibility for users with visual impairments
def_extract_sentiment(self,transcript)->Dict[str,Any]:"""Real-time sentiment analysis with confidence scoring"""text=transcript.text.lower()positive_words=['good','great','excellent','happy','love','amazing']negative_words=['bad','terrible','awful','hate','sad','angry']positive_count=sum(1forwordinpositive_wordsifwordintext)negative_count=sum(1forwordinnegative_wordsifwordintext)ifpositive_count>negative_count:sentiment_score=min(0.8,positive_count*0.3)sentiment_label='positive'elifnegative_count>positive_count:sentiment_score=max(-0.8,-negative_count*0.3)sentiment_label='negative'else:sentiment_score=0.0sentiment_label='neutral'return{'label':sentiment_label,'score':sentiment_score,'confidence':0.75}def_detect_tone(self,text:str)->Dict[str,Any]:"""Multi-dimensional tone detection"""tone_patterns={'excited':['!','wow','amazing','incredible','fantastic'],'calm':['okay','fine','sure','alright','peaceful'],'angry':['damn','hell','angry','mad','furious'],'sad':['sad','depressed','down','unhappy','crying'],'happy':['happy','joy','cheerful','glad','delighted']}tone_scores={}fortone,patternsintone_patterns.items():score=sum(1forpatterninpatternsifpatternintext.lower())tone_scores[tone]=scoremax_tone=max(tone_scores.items(),key=lambdax:x[1])return{'tone':max_tone[0]ifmax_tone[1]>0else'neutral','confidence':min(0.9,max_tone[1]*0.3),'scores':tone_scores}
Performance Optimization
VoiceAccess implements comprehensive performance monitoring and optimization:
classPerformanceMonitor:def__init__(self):self.thresholds={'max_latency_ms':300,'max_cpu_percent':80.0,'max_memory_percent':85.0,'min_accuracy':0.85}def_check_performance_alerts(self,metrics:PerformanceMetrics):"""Real-time performance monitoring with alerts"""ifmetrics.latency_ms>self.thresholds['max_latency_ms']:self._add_alert('high_latency',f"High latency detected: {metrics.latency_ms:.0f}ms",'warning')ifmetrics.cpu_percent>self.thresholds['max_cpu_percent']:self._add_alert('high_cpu',f"High CPU usage: {metrics.cpu_percent:.1f}%",'warning')def_calculate_performance_score(self,metrics:List[PerformanceMetrics])->float:"""Comprehensive performance scoring algorithm"""scores=[]# Latency score (lower is better)
latencies=[m.latency_msforminmetricsifm.latency_ms>0]iflatencies:avg_latency=sum(latencies)/len(latencies)latency_score=max(0,100-(avg_latency/self.thresholds['max_latency_ms'])*100)scores.append(latency_score)returnsum(scores)/len(scores)ifscoreselse0.0
Accessibility-First Design
WCAG 2.1 AA Compliance
VoiceAccess was built from the ground up with accessibility as a primary concern, not an afterthought:
classAccessibilityFeatures:def__init__(self):# WCAG 2.1 AA compliant color schemes
self.high_contrast_colors={'background':'#000000','text':'#ffffff','primary':'#ffffff','success':'#00ff00','warning':'#ffff00','error':'#ff0000'}defvalidate_color_contrast(self,foreground:str,background:str)->Dict[str,Any]:"""WCAG 2.1 color contrast validation"""contrast_ratio=self._calculate_contrast_ratio(foreground,background)return{'contrast_ratio':contrast_ratio,'aa_normal':contrast_ratio>=4.5,'aa_large':contrast_ratio>=3.0,'aaa_normal':contrast_ratio>=7.0,'wcag_level':'AAA'ifcontrast_ratio>=7.0else'AA'ifcontrast_ratio>=4.5else'Fail'}
Visual Accessibility Features
The application provides comprehensive visual accessibility options:
High Contrast Mode: Switches to white-on-black color scheme with enhanced contrast ratios
Scalable Typography: Font sizes from 12px to 28px with optimal line spacing
Visual Alert System: Flash notifications replace audio cues for important events
Color-Blind Friendly Palettes: Alternative color schemes for various types of color vision deficiency
Focus Management: Clear visual focus indicators for keyboard navigation
Keyboard Navigation
Complete keyboard accessibility ensures the application works for users who cannot use a mouse:
defcreate_focus_management(self):"""Comprehensive keyboard navigation implementation"""focus_script="""
document.addEventListener('keydown', function(e) {
if (e.target.tagName !== 'INPUT' && e.target.tagName !== 'TEXTAREA') {
switch(e.key.toLowerCase()) {
case '':
// Space for start/stop recording
const recordButton = document.querySelector('[data-testid="baseButton-secondary"]');
if (recordButton) {
recordButton.click();
e.preventDefault();
}
break;
case 's':
// S for settings panel
const settingsSection = document.querySelector('.stSidebar');
if (settingsSection) {
settingsSection.scrollIntoView();
e.preventDefault();
}
break;
}
}
});
"""
Performance Metrics
Latency Achievements
VoiceAccess consistently achieves sub-300ms transcription latency through several optimization strategies:
Optimized Audio Pipeline: Minimal buffering with efficient preprocessing
Streamlined API Integration: Direct WebSocket connection to AssemblyAI Universal-Streaming
def_reconnect(self):"""Intelligent reconnection with exponential backoff"""max_retries=3retry_delay=2forattemptinrange(max_retries):logger.info(f"Reconnection attempt {attempt+1}/{max_retries}")self.disconnect()time.sleep(retry_delay)ifself.connect():logger.info("Reconnection successful")returnretry_delay*=2# Exponential backoff
logger.error("Failed to reconnect after maximum retries")
Installation and Setup
Quick Start Guide
VoiceAccess provides multiple installation paths to accommodate different system configurations:
Automatic Installation (Recommended):
python install_dependencies.py
Minimal Installation (For systems with dependency issues):
API Development: RESTful API for third-party integrations
The VoiceAccess project represents a significant step forward in making real-time communication accessible to everyone, demonstrating how cutting-edge AI technology can be harnessed to create meaningful social impact while achieving technical excellence in performance and accessibility.
This is an incredible example of how real-time AI can be used to promote accessibility and inclusion. The sub-300ms transcription, emotional tone detection, and sentiment analysis are impressive features, especially for users who rely on visual communication. The focus on WCAG compliance and user-friendly design shows a strong commitment to usability. Looking forward to seeing how this evolves in the future—great work!
Yes, It is the other part of the communication. We need to incorporate a Text to Speech Model to create speech from text or sign languages. I haven't covered that as it's outside of the scope of this competition. However, in a real world scenario, They both go hand in hand to create a complete application.
This is truly an inspiring and impactful project 🎉 The focus on real-time transcription under 300ms latency and accessibility-first design is exactly the kind of innovation we need to empower the deaf and hard-of-hearing community. I especially appreciate the attention to emotional tone detection and multi-modal feedback—it adds a whole new layer of inclusivity. Kudos for integrating WCAG 2.1 AA compliance and offering a performance dashboard as well. 💡
Truly inspiring work sir
Voice of Voiceless is a brilliant example of using tech for real social impact. The focus on accessibility, real-time communication, and emotional context shows both empathy and innovation. Looking forward to seeing this evolve great job
Edited Architectural Diagram: