Who Said What? Build a Smart Transcriber Agent with AWS & LangChain
Chandrani Mukherjee

Chandrani Mukherjee @moni121189

About: As a Sr. Solution Enterprise Architect and MS in AI/ML from Liverpool John Moors University , UK, I have been a key contributor to global organizations like Mphasis AI, McKesson, First Abu Dhabi Bank

Location:
New Jersey
Joined:
Jul 5, 2025

Who Said What? Build a Smart Transcriber Agent with AWS & LangChain

Publish Date: Jul 21
42 2

Amazon Transcribe provides automatic speech recognition (ASR) with support for speaker diarization—the process of labeling individual speakers in audio recordings.


🛠️ Prerequisites

  • ✅ AWS Account
  • ✅ AWS CLI or SDK installed and configured
  • ✅ An S3 bucket to store audio files
  • ✅ Audio file in supported format (e.g., .wav, .mp3, .flac)

📤 Step 1: Upload Audio to Amazon S3

aws s3 cp your_audio_file.wav s3://your-bucket-name/
Enter fullscreen mode Exit fullscreen mode

🧠 Step 2: Start Transcription Job with Speaker Diarization Enabled

aws transcribe start-transcription-job \
  --transcription-job-name "diarization-job-001" \
  --language-code "en-US" \
  --media MediaFileUri=s3://your-bucket-name/your_audio_file.wav \
  --output-bucket-name your-output-bucket \
  --settings ShowSpeakerLabels=true,MaxSpeakerLabels=5
Enter fullscreen mode Exit fullscreen mode

📌 ShowSpeakerLabels=true enables speaker diarization

📌 MaxSpeakerLabels=5 sets an upper limit on the number of speakers


⏳ Step 3: Check Transcription Job Status

aws transcribe get-transcription-job \
  --transcription-job-name "diarization-job-001"
Enter fullscreen mode Exit fullscreen mode

Once the job status becomes COMPLETED, the transcription JSON is available in your S3 output bucket.


📄 Step 4: View Diarized Transcription Output

Sample excerpt from the output JSON:

{
  "results": {
    "speaker_labels": {
      "segments": [
        {
          "speaker_label": "spk_0",
          "start_time": "0.0",
          "end_time": "2.5"
        }
      ]
    },
    "items": [
      {
        "start_time": "0.0",
        "end_time": "0.7",
        "alternatives": [
          {
            "confidence": "1.0",
            "content": "Hello"
          }
        ],
        "type": "pronunciation",
        "speaker_label": "spk_0"
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

🐍 Optional: Python Script to Start Job

import boto3

transcribe = boto3.client('transcribe')

transcribe.start_transcription_job(
    TranscriptionJobName='diarization-job-001',
    LanguageCode='en-US',
    Media={'MediaFileUri': 's3://your-bucket-name/your_audio_file.wav'},
    OutputBucketName='your-output-bucket',
    Settings={
        'ShowSpeakerLabels': True,
        'MaxSpeakerLabels': 5
    }
)
Enter fullscreen mode Exit fullscreen mode

📝 Optional: Convert Output to Readable Text

Example post-processed output:

Speaker 1: Hello, how are you?
Speaker 2: I'm doing well, thanks. And you?
Speaker 1: I'm great!
Enter fullscreen mode Exit fullscreen mode

You can write a script to process the JSON and reformat it into readable dialogue using speaker labels and timestamps.


🧩 Notes

  • Speaker Diarization is only supported in batch mode, not real-time.
  • The accuracy depends on the quality of the audio and clarity of speaker voices.
  • Diarization is supported for select languages (e.g., English).

📚 Resources



🤖 Bonus: Create a Transcriber Agent using LangChain and AWS

You can automate the transcription and diarization process using a LangChain agent!

🧩 Requirements

  • langchain
  • boto3
  • openai (for natural language post-processing or QA)

📦 Install Dependencies

pip install langchain boto3 openai
Enter fullscreen mode Exit fullscreen mode

🤖 Sample LangChain Agent Setup

from langchain.agents import initialize_agent, Tool
from langchain.llms import OpenAI
import boto3

# Tool to trigger transcription job
def start_transcription_job(file_uri):
    transcribe = boto3.client('transcribe')
    response = transcribe.start_transcription_job(
        TranscriptionJobName="LangChainDiarizationJob",
        LanguageCode="en-US",
        Media={'MediaFileUri': file_uri},
        OutputBucketName='your-output-bucket',
        Settings={
            'ShowSpeakerLabels': True,
            'MaxSpeakerLabels': 5
        }
    )
    return "Started transcription job: LangChainDiarizationJob"

# Register tool with LangChain
tools = [
    Tool(
        name="AWSTranscribeDiarizer",
        func=start_transcription_job,
        description="Start a diarization transcription job using AWS Transcribe given an S3 audio URL"
    )
]

# Initialize agent with OpenAI and tools
llm = OpenAI(temperature=0)
agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)

# Run agent with a prompt
agent.run("Transcribe the file at s3://your-bucket-name/your_audio.wav with speaker labels")
Enter fullscreen mode Exit fullscreen mode

🧠 What This Agent Does

  • Accepts a prompt to trigger AWS Transcribe
  • Starts diarization on a given audio URL
  • Can be extended to fetch and format output, or even generate summaries!

Comments 2 total

  • Larry Coman
    Larry ComanJul 21, 2025

    Hello Beautiful how are you doing! Am very impressed with your profile I pray God to give you a perfect gift of health, long life, May your day be filled with 🌺❤ ❣❣❣❣️happiness and joy and may you be filled with thousands of smile If you don.t mind sending me a text

Add comment