Amazon Transcribe provides automatic speech recognition (ASR) with support for speaker diarization—the process of labeling individual speakers in audio recordings.
🛠️ Prerequisites
- ✅ AWS Account
- ✅ AWS CLI or SDK installed and configured
- ✅ An S3 bucket to store audio files
- ✅ Audio file in supported format (e.g.,
.wav
,.mp3
,.flac
)
📤 Step 1: Upload Audio to Amazon S3
aws s3 cp your_audio_file.wav s3://your-bucket-name/
🧠 Step 2: Start Transcription Job with Speaker Diarization Enabled
aws transcribe start-transcription-job \
--transcription-job-name "diarization-job-001" \
--language-code "en-US" \
--media MediaFileUri=s3://your-bucket-name/your_audio_file.wav \
--output-bucket-name your-output-bucket \
--settings ShowSpeakerLabels=true,MaxSpeakerLabels=5
📌
ShowSpeakerLabels=true
enables speaker diarization
📌MaxSpeakerLabels=5
sets an upper limit on the number of speakers
⏳ Step 3: Check Transcription Job Status
aws transcribe get-transcription-job \
--transcription-job-name "diarization-job-001"
Once the job status becomes COMPLETED
, the transcription JSON is available in your S3 output bucket.
📄 Step 4: View Diarized Transcription Output
Sample excerpt from the output JSON:
{
"results": {
"speaker_labels": {
"segments": [
{
"speaker_label": "spk_0",
"start_time": "0.0",
"end_time": "2.5"
}
]
},
"items": [
{
"start_time": "0.0",
"end_time": "0.7",
"alternatives": [
{
"confidence": "1.0",
"content": "Hello"
}
],
"type": "pronunciation",
"speaker_label": "spk_0"
}
]
}
}
🐍 Optional: Python Script to Start Job
import boto3
transcribe = boto3.client('transcribe')
transcribe.start_transcription_job(
TranscriptionJobName='diarization-job-001',
LanguageCode='en-US',
Media={'MediaFileUri': 's3://your-bucket-name/your_audio_file.wav'},
OutputBucketName='your-output-bucket',
Settings={
'ShowSpeakerLabels': True,
'MaxSpeakerLabels': 5
}
)
📝 Optional: Convert Output to Readable Text
Example post-processed output:
Speaker 1: Hello, how are you?
Speaker 2: I'm doing well, thanks. And you?
Speaker 1: I'm great!
You can write a script to process the JSON and reformat it into readable dialogue using speaker labels and timestamps.
🧩 Notes
- Speaker Diarization is only supported in batch mode, not real-time.
- The accuracy depends on the quality of the audio and clarity of speaker voices.
- Diarization is supported for select languages (e.g., English).
📚 Resources
🤖 Bonus: Create a Transcriber Agent using LangChain and AWS
You can automate the transcription and diarization process using a LangChain agent!
🧩 Requirements
langchain
boto3
-
openai
(for natural language post-processing or QA)
📦 Install Dependencies
pip install langchain boto3 openai
🤖 Sample LangChain Agent Setup
from langchain.agents import initialize_agent, Tool
from langchain.llms import OpenAI
import boto3
# Tool to trigger transcription job
def start_transcription_job(file_uri):
transcribe = boto3.client('transcribe')
response = transcribe.start_transcription_job(
TranscriptionJobName="LangChainDiarizationJob",
LanguageCode="en-US",
Media={'MediaFileUri': file_uri},
OutputBucketName='your-output-bucket',
Settings={
'ShowSpeakerLabels': True,
'MaxSpeakerLabels': 5
}
)
return "Started transcription job: LangChainDiarizationJob"
# Register tool with LangChain
tools = [
Tool(
name="AWSTranscribeDiarizer",
func=start_transcription_job,
description="Start a diarization transcription job using AWS Transcribe given an S3 audio URL"
)
]
# Initialize agent with OpenAI and tools
llm = OpenAI(temperature=0)
agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)
# Run agent with a prompt
agent.run("Transcribe the file at s3://your-bucket-name/your_audio.wav with speaker labels")
🧠 What This Agent Does
- Accepts a prompt to trigger AWS Transcribe
- Starts diarization on a given audio URL
- Can be extended to fetch and format output, or even generate summaries!
Hello Beautiful how are you doing! Am very impressed with your profile I pray God to give you a perfect gift of health, long life, May your day be filled with 🌺❤ ❣❣❣❣️happiness and joy and may you be filled with thousands of smile If you don.t mind sending me a text