🧩 The problem
Having a YouTube mirror can be beneficial in case something happens to your main channel. I have mirrors of my Git repositories, and self-host other stuff as well. So, why not do this for video content?
In this post I will show you how to create a simple YouTube mirror you can host through Jekyll and a standard web server.
No, I'm not going to talk about Peertube, although that can also be an alternative. My solution is still centralized and dumb, but it works. The advantage is that it runs on a static site, so no database or maintenance is involved.
⚠️ Warning
⚠️⚠️ Before continuing, please only mirror content you have permission to! ⚠️⚠️
✅ The solution
yt-dlp is of course the tool of choice for this job. It's incredibly powerful because of the hundreds of options available. This first version is a shell script that downloads the following elements:
File | Type | Notes |
---|---|---|
video | WebM | format widely used and supported by all major browsers |
thumbnail | PNG | we need this as video cover |
subtitles | VTT & embedded in the video | just in case |
title | TXT | debug & will be useful in the next iterations |
description | TXT | might be useful in the future |
🖥️ Script
First of all create a Python virtual environment:
python3 -m venv .venv
This is the script:
#!/usr/bin/env bash
set -euo pipefail
URL="${1}"
DST_DIR='/srv/http/videos'
# YouTube changes frequently, we keep the program always up to date like this.
. .venv/bin/activate
pip install -U yt-dlp
pushd "${DST_DIR}"
# Reason to recode to webm:
# https://github.com/yt-dlp/yt-dlp/issues/775#issuecomment-904271715
yt-dlp "${URL}" \
--verbose \
--fixup detect_or_warn \
--prefer-ffmpeg \
--sub-langs "en,it" \
--write-subs \
--embed-subs \
--write-auto-sub \
--prefer-free-formats \
--no-call-home \
--no-overwrites \
--recode webm \
--add-metadata \
--write-thumbnail \
--convert-thumbnails png \
--exec "cat > ${DST_DIR}/%(id)s/title.txt << 'EOF'
%(title)s
EOF" \
--exec "cat > ${DST_DIR}/%(id)s/description.txt << 'EOF'
%(description)s
EOF" \
--output "${DST_DIR}/%(id)s/%(id)s.%(ext)s"
popd
Special care must be taken in correctly quoting the title and description inputs! This is very tricky!!! Alternatively you can use "here-documents" as I did in the example.
You can pass a single video URL or the channel URL as input:
./mirror_youtube.sh "video or channel URL"
A canonical video URL on YouTube might be:
https://www.youtube.com/watch?v=aaa000
In this example the video ID is aaa000
. This means the downloaded files will be placed inside /srv/http/videos/aaa000
. Specifically the video, thumbnail and other elements will be named:
/srv/http/videos/aaa000/aaa000.webm
/srv/http/videos/aaa000/aaa000.png
/srv/http/videos/aaa000/aaa000.en.vtt
/srv/http/videos/aaa000/aaa000.it.vtt
/srv/http/videos/aaa000/description.txt
/srv/http/videos/aaa000/title.txt
respectively. Titles and video descriptions are saved via separate spawned commands (exec
).
If we keep this naming consistent we can create Jekyll pages just using a layout and some front matter later on.
🕸️ Serving the files
For the moment I avoid tracking the YouTube files in my blog repository. I think it's unnecessary and would create a huge repo even if using Git LFS.
I simply serve these files on a separate domain using normal web server. This is an example Apache configuration:
<IfModule mod_ssl.c>
<VirtualHost *:443>
UseCanonicalName on
Keepalive On
RewriteEngine on
ServerName your.video.assets.domain
Alias /icons/ "/usr/share/apache2/icons/"
DocumentRoot "/srv/http/videos"
<Directory "/srv/http/videos">
Options -ExecCGI -Includes
Options +Indexes +SymlinksIfOwnerMatch
IndexOptions NameWidth=* +SuppressDescription FancyIndexing Charset=UTF-8 VersionSort FoldersFirst
ReadmeName footer.html
IndexIgnore header.html footer.html
#
# AllowOverride controls what directives may be placed in .htaccess files.
# It can be "All", "None", or any combination of the keywords:
# AllowOverride FileInfo AuthConfig Limit
#
AllowOverride All
#
# Controls who can get stuff from this server.
#
Require all granted
</Directory>
SSLCompression off
Include /etc/letsencrypt/options-ssl-apache.conf
SSLCertificateFile /etc/letsencrypt/live/your.video.assets.domain/fullchain.pem
SSLCertificateKeyFile /etc/letsencrypt/live/your.video.assets.domain/privkey.pem
</VirtualHost>
</IfModule>
I prefer directory listings enabled so it acts as a sort of debug output.
📄 Jekyll
Let's see the Jekyll HTML layout now:
---
layout: page
---
{% if page.comment %}
{{ page.comment }}
{% endif %}
{% if page.embed_html %}
{{ page.embed_html }}
{% endif %}
{% if page.backup_url != '' %}
{% if page.embed_html %}
<em>Backup</em>
{% endif %}
<video
controls
width="560"
height="315"
preload="none"
{% if page.poster_image %}poster="{{ page.poster_image }}"{% endif %}>
<source src="{{ page.backup_url }}" type="video/webm" />
<p>Your browser doesn't support HTML videos. Here is a [link]({{ page.backup_url }}) instead.</p>
</video>
{% endif %}
<p><a href="{{ page.canonical_url }}">Original video on YouTube</a></p>
What all this means is that we're embedding a video using plain HTML5 elements. The thumbnail is used as well, like on YouTube. This example file, called 7Rh8hWMmZeA.md
, might help you understand better:
---
layout: youtube_mirror
title: ! 'fastmeshapi: a fast, persistent Meshtastic web app (part 3 - packet table)'
permalink: /courses/solvecomputerscience-youtube-channel/7Rh8hWMmZeA/
description: ! 'fastmeshapi: a fast, persistent Meshtastic web app (part 3 - packet table)'
enable_markdown: true
lang: 'en'
backup_url: ! 'https://your.video.assets.domain/video/SolveComputerScience/7Rh8hWMmZeA/7Rh8hWMmZeA.webm'
poster_image: ! 'https://your.video.assets.domain/video/SolveComputerScience/7Rh8hWMmZeA/7Rh8hWMmZeA.png'
is_youtube_mirror: true
canonical_url: ! 'https://www.youtube.com/watch?v=7Rh8hWMmZeA'
---
The canonical URL pointing to YouTube is very important to avoid the page being marked as possible duplicate content by search engines.
🎯 Result
The whole system works, but at the moment the markdown files need to be created manually. The video download script can be managed by a Systemd timer to trigger once a day for example.
Incidentally, if you run the script multiple times and you change the title or descriptions in the meantime, these will be automatically updated.
🚀 Future updates
In the next posts I will show you how to improve automation for this use case. We'll transform the shell script into Python and use the YouTube feeds.
🎉 Conclusion
If you are interested in the source code you can find it on its repo.
You can comment here and check my YouTube channel.
Airdrop alert! free tokens now live for Dev.to contributors as a thank-you for your contributions! Don’t miss this opportunity here (for verified Dev.to users only). – Dev.to Team