Content Moderation in Node.js: Building a Scalable Image Moderation Pipeline with MinIO, BullMQ, ClamAV, DeepStack & Hashing 🧬
Ali nazari

Ali nazari @silentwatcher_95

About: Just a tech

Location:
Earth 🌍
Joined:
Jul 30, 2022

Content Moderation in Node.js: Building a Scalable Image Moderation Pipeline with MinIO, BullMQ, ClamAV, DeepStack & Hashing 🧬

Publish Date: Jun 13
6 0

Content moderation is critical in user-generated platforms. Whether you're running a social app, marketplace, or community site, you need to filter out inappropriate, violent, or malicious media—without compromising on performance.

In this guide, we’ll walk through how to build a scalable image moderation pipeline using:

  • ✅ Virus scanning with ClamAV
  • 🖼️ Multi-size conversion via Sharp
  • 🧠 NSFW detection via Google Vision API or NudeNet
  • 🔐 Private file storage in MinIO
  • 📦 Asynchronous job handling with BullMQ
  • 🧬 Known-bad image matching via perceptual hashing

Let’s build it, step by step.


📥 Step 1: Upload Quarantine

Problem: Don’t immediately expose uploaded images to the public or other users. What if they’re dangerous or explicit?

✅ Solution: Save to a "Quarantine" Bucket

// Save original image temporarily before moderation
await minio.putObject('quarantine', tempFileName, fileBuffer);
Enter fullscreen mode Exit fullscreen mode

The quarantine bucket should be private. Use an ACL or prefix-based policy to restrict access.

🔁 Flow

graph LR
A[User Upload] --> B[MinIO Quarantine Bucket]
B --> C[BullMQ Queue]
C --> D[Worker: virus scan, moderation, hash check]
D -->|Clean| E[Public Bucket + DB Update]
D -->|Inappropriate| F[Reject + Notify User]
Enter fullscreen mode Exit fullscreen mode

🛡️ Step 2: Virus Scanning with ClamAV

Before processing any media, scan it for malware using the open-source antivirus engine ClamAV.

✅ Setup (Docker)

clamav:
  image: clamav/clamav:stable
  ports:
    - "3310:3310"
Enter fullscreen mode Exit fullscreen mode

✅ Code Integration

import ClamScan from 'clamscan';

const clamscan = await new ClamScan().init();
const { isInfected, viruses } = await clamscan.scanBuffer(fileBuffer);

if (isInfected) throw new Error(`Infected file: ${viruses.join(', ')}`);
Enter fullscreen mode Exit fullscreen mode

🔐 This prevents attackers from uploading files containing trojans or backdoors.


🧠 Step 3: ML-Based Content Classification

Use a deep learning model to detect inappropriate content: nudity, violence, hate, etc.

✅ Option A: Google Vision API (SafeSearch)

import vision from '@google-cloud/vision';

const client = new vision.ImageAnnotatorClient();
const [result] = await client.safeSearchDetection({ image: { content: buffer } });
const safe = result.safeSearchAnnotation;

if (safe.adult === 'VERY_LIKELY' || safe.violence === 'LIKELY') {
  throw new Error('Image flagged as unsafe');
}
Enter fullscreen mode Exit fullscreen mode

Requires a GCP account + service account key file.

✅ Option B: Self-hosted (NudeNet / DeepStack)

Run it via Docker:

docker run -p 5000:5000 nudenet-server
Enter fullscreen mode Exit fullscreen mode

Request:

const res = await fetch('http://localhost:5000/classify', {
  method: 'POST',
  body: formData,
});
const data = await res.json();

if (data.predictions.some(p => p.prob > 0.9)) {
  throw new Error('Image contains nudity');
}
Enter fullscreen mode Exit fullscreen mode

🧬 Step 4: Known-Bad-Image Fingerprinting

For illegal content (e.g. CSAM), use image hashing to detect duplicates.

🧩 Perceptual Hashing (pHash / aHash / dHash)

These algorithms create “visual fingerprints” of images. Two visually similar images produce similar hashes—even if they’re resized, compressed, or recolored.

import { imageHash } from 'image-hash';

imageHash(buffer, 16, true, (error, hash) => {
  if (denylist.has(hash)) throw new Error('Image in blocklist');
});
Enter fullscreen mode Exit fullscreen mode

🚨 PhotoDNA

Used for CSAM. Not open source. You can apply to Microsoft or use third-party moderation platforms like:

  • Microsoft Content Moderator
  • Thorn’s CSAM API (restricted access)

🧵 Step 5: Asynchronous Moderation Pipeline

You shouldn't block the user’s HTTP request while processing everything. Instead:

1. Accept the upload → push job

const job = await queue.add('moderate-image', { userId, buffer });
return res.status(202).json({ jobId: job.id });
Enter fullscreen mode Exit fullscreen mode

2. Background worker handles it

const worker = new Worker('moderate-image', async (job) => {
  // virus scan
  // moderation
  // pHash
  // sharp → webp & resize
  // upload to MinIO
  // update DB
});
Enter fullscreen mode Exit fullscreen mode

3. Track job status

const job = await queue.getJob(jobId);
return res.json({ status: job?.getState(), progress: job?.progress() });
Enter fullscreen mode Exit fullscreen mode

🎨 Step 6: Image Transformation with Sharp

Resize + convert to WebP for fast delivery:

const sizes = [
  { name: 'thumb', w: 64, h: 64 },
  { name: 'medium', w: 256, h: 256 },
  { name: 'full', w: 1024, h: 1024 },
];

const processed = await Promise.all(sizes.map(({ name, w, h }) =>
  sharp(originalBuffer)
    .resize(w, h)
    .webp({ quality: 80 })
    .toBuffer()
));
Enter fullscreen mode Exit fullscreen mode

🧠 Moderation Policy & Thresholds

Every product has different tolerance levels.

Label Threshold Action
Adult Nudity >90% Reject
Violence >80% Quarantine
Self-Harm >70% Manual Review

Store this in a config or DB so you can update without redeploying.


🧼 Cleanup + Fallbacks

If a job fails:

  • Remove uploaded objects
  • Set error fields in DB
  • Optionally alert admins
imageWorker.on('failed', async (job, err) => {
  // Remove objects from MinIO
  for (const f of job.data.uploads) {
    await minio.removeObject('avatars', f.name);
  }

  // Update DB
  await users.updateOne(
    { _id: job.data.userId },
    { $unset: { pendingAvatarJobId: "" }, $set: { avatarJobError: err.message } }
  );
});
Enter fullscreen mode Exit fullscreen mode

📌 Recap

✅ Virus Scanning
✅ ML Content Classification (Cloud & Local)
✅ Known-bad Hash Detection
✅ Background Processing with BullMQ
✅ Web-Optimized Resizing
✅ Quarantine Workflow
✅ Policy-based Moderation
✅ User Feedback + Retry Handling


🏁 Final Words

What we’ve implemented here is an industry-standard pipeline that mirrors what platforms like Facebook, Discord, and Reddit use at scale. You:

  • Protect your users
  • Protect your platform’s reputation
  • Stay compliant with legal standards

Want to go further?

  • Add human moderation dashboards
  • Add appeal/review workflows
  • Add watermarking or steganographic IDs

Comments 0 total

    Add comment