How to create video transcription with ffmpeg and whisper

Requirements

ffmpeg
whisper
Python 3.10+ (for Whisper)

Installation

macOS

# Install Homebrew if you don't have it
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install ffmpeg
brew install ffmpeg

# Install Python (if needed)
brew install python

# Install Whisper
pip3 install --upgrade pip
pip3 install git+https://github.com/openai/whisper.git

Windows

# Install Chocolatey if you don't have it
# Run in PowerShell as administrator:
Set-ExecutionPolicy Bypass -Scope Process -Force
[System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072
iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))

# Install ffmpeg
choco install ffmpeg

# Install Python (from python.org)
# Make sure to check "Add Python to PATH" during installation

# Install Whisper
pip install -U openai-whisper

Linux

# Install ffmpeg
sudo apt update && sudo apt install ffmpeg

# Install Python and pip
sudo apt install python3 python3-pip

# Install Whisper
pip3 install git+https://github.com/openai/whisper.git

Transcription Steps

Extract audio from video using ffmpeg

   ffmpeg -i input_video.mp4 -vn -acodec mp3 output.mp3

Transcribe audio with Whisper

   whisper output.mp3 --language English --model small --output_format txt

Model Options

tiny: Fastest, lowest accuracy (~1GB RAM)
base: Fast, decent accuracy (~1GB RAM)
small: Balanced speed/accuracy (~2GB RAM)
medium: Good accuracy (~5GB RAM)
large: Best accuracy (~10GB RAM)

Output Formats

txt: Plain text transcript
srt: Standard subtitle format
vtt: Web Video Text Tracks format
json: Detailed JSON with timestamps

Additional Options

--task translate: Translates non-English audio to English
--language en: Specifies the source language (faster and more accurate)
--model: Selects the model size (tiny/base/small/medium/large)

Source: macos.gadgethacks.com
Source: dev.to

Mark Kop @heymarkkop

How to create video transcription with ffmpeg and whisper

Requirements

Installation

macOS

Windows

Linux

Transcription Steps

Model Options

Output Formats

Additional Options

Comments 1 total