Content moderation is critical in user-generated platforms. Whether you're running a social app, marketplace, or community site, you need to filter out inappropriate, violent, or malicious media—without compromising on performance.
In this guide, we’ll walk through how to build a scalable image moderation pipeline using:
- ✅ Virus scanning with ClamAV
- 🖼️ Multi-size conversion via Sharp
- 🧠 NSFW detection via Google Vision API or NudeNet
- 🔐 Private file storage in MinIO
- 📦 Asynchronous job handling with BullMQ
- 🧬 Known-bad image matching via perceptual hashing
Let’s build it, step by step.
📥 Step 1: Upload Quarantine
Problem: Don’t immediately expose uploaded images to the public or other users. What if they’re dangerous or explicit?
✅ Solution: Save to a "Quarantine" Bucket
// Save original image temporarily before moderation
await minio.putObject('quarantine', tempFileName, fileBuffer);
The quarantine bucket should be private. Use an ACL or prefix-based policy to restrict access.
🔁 Flow
graph LR
A[User Upload] --> B[MinIO Quarantine Bucket]
B --> C[BullMQ Queue]
C --> D[Worker: virus scan, moderation, hash check]
D -->|Clean| E[Public Bucket + DB Update]
D -->|Inappropriate| F[Reject + Notify User]
🛡️ Step 2: Virus Scanning with ClamAV
Before processing any media, scan it for malware using the open-source antivirus engine ClamAV.
✅ Setup (Docker)
clamav:
image: clamav/clamav:stable
ports:
- "3310:3310"
✅ Code Integration
import ClamScan from 'clamscan';
const clamscan = await new ClamScan().init();
const { isInfected, viruses } = await clamscan.scanBuffer(fileBuffer);
if (isInfected) throw new Error(`Infected file: ${viruses.join(', ')}`);
🔐 This prevents attackers from uploading files containing trojans or backdoors.
🧠 Step 3: ML-Based Content Classification
Use a deep learning model to detect inappropriate content: nudity, violence, hate, etc.
✅ Option A: Google Vision API (SafeSearch)
import vision from '@google-cloud/vision';
const client = new vision.ImageAnnotatorClient();
const [result] = await client.safeSearchDetection({ image: { content: buffer } });
const safe = result.safeSearchAnnotation;
if (safe.adult === 'VERY_LIKELY' || safe.violence === 'LIKELY') {
throw new Error('Image flagged as unsafe');
}
Requires a GCP account + service account key file.
✅ Option B: Self-hosted (NudeNet / DeepStack)
Run it via Docker:
docker run -p 5000:5000 nudenet-server
Request:
const res = await fetch('http://localhost:5000/classify', {
method: 'POST',
body: formData,
});
const data = await res.json();
if (data.predictions.some(p => p.prob > 0.9)) {
throw new Error('Image contains nudity');
}
🧬 Step 4: Known-Bad-Image Fingerprinting
For illegal content (e.g. CSAM), use image hashing to detect duplicates.
🧩 Perceptual Hashing (pHash / aHash / dHash)
These algorithms create “visual fingerprints” of images. Two visually similar images produce similar hashes—even if they’re resized, compressed, or recolored.
import { imageHash } from 'image-hash';
imageHash(buffer, 16, true, (error, hash) => {
if (denylist.has(hash)) throw new Error('Image in blocklist');
});
🚨 PhotoDNA
Used for CSAM. Not open source. You can apply to Microsoft or use third-party moderation platforms like:
- Microsoft Content Moderator
- Thorn’s CSAM API (restricted access)
🧵 Step 5: Asynchronous Moderation Pipeline
You shouldn't block the user’s HTTP request while processing everything. Instead:
1. Accept the upload → push job
const job = await queue.add('moderate-image', { userId, buffer });
return res.status(202).json({ jobId: job.id });
2. Background worker handles it
const worker = new Worker('moderate-image', async (job) => {
// virus scan
// moderation
// pHash
// sharp → webp & resize
// upload to MinIO
// update DB
});
3. Track job status
const job = await queue.getJob(jobId);
return res.json({ status: job?.getState(), progress: job?.progress() });
🎨 Step 6: Image Transformation with Sharp
Resize + convert to WebP for fast delivery:
const sizes = [
{ name: 'thumb', w: 64, h: 64 },
{ name: 'medium', w: 256, h: 256 },
{ name: 'full', w: 1024, h: 1024 },
];
const processed = await Promise.all(sizes.map(({ name, w, h }) =>
sharp(originalBuffer)
.resize(w, h)
.webp({ quality: 80 })
.toBuffer()
));
🧠 Moderation Policy & Thresholds
Every product has different tolerance levels.
Label | Threshold | Action |
---|---|---|
Adult Nudity | >90% | Reject |
Violence | >80% | Quarantine |
Self-Harm | >70% | Manual Review |
Store this in a config or DB so you can update without redeploying.
🧼 Cleanup + Fallbacks
If a job fails:
- Remove uploaded objects
- Set error fields in DB
- Optionally alert admins
imageWorker.on('failed', async (job, err) => {
// Remove objects from MinIO
for (const f of job.data.uploads) {
await minio.removeObject('avatars', f.name);
}
// Update DB
await users.updateOne(
{ _id: job.data.userId },
{ $unset: { pendingAvatarJobId: "" }, $set: { avatarJobError: err.message } }
);
});
📌 Recap
✅ Virus Scanning
✅ ML Content Classification (Cloud & Local)
✅ Known-bad Hash Detection
✅ Background Processing with BullMQ
✅ Web-Optimized Resizing
✅ Quarantine Workflow
✅ Policy-based Moderation
✅ User Feedback + Retry Handling
🏁 Final Words
What we’ve implemented here is an industry-standard pipeline that mirrors what platforms like Facebook, Discord, and Reddit use at scale. You:
- Protect your users
- Protect your platform’s reputation
- Stay compliant with legal standards
Want to go further?
- Add human moderation dashboards
- Add appeal/review workflows
- Add watermarking or steganographic IDs