Content Moderation in Node.js: Building a Scalable Image Moderation Pipeline with MinIO, BullMQ, ClamAV, DeepStack & Hashing 🧬

Content moderation is critical in user-generated platforms. Whether you're running a social app, marketplace, or community site, you need to filter out inappropriate, violent, or malicious media—without compromising on performance.

In this guide, we’ll walk through how to build a scalable image moderation pipeline using:

✅ Virus scanning with ClamAV
🖼️ Multi-size conversion via Sharp
🧠 NSFW detection via Google Vision API or NudeNet
🔐 Private file storage in MinIO
📦 Asynchronous job handling with BullMQ
🧬 Known-bad image matching via perceptual hashing

Let’s build it, step by step.

📥 Step 1: Upload Quarantine

Problem: Don’t immediately expose uploaded images to the public or other users. What if they’re dangerous or explicit?

✅ Solution: Save to a "Quarantine" Bucket

// Save original image temporarily before moderation
await minio.putObject('quarantine', tempFileName, fileBuffer);

The quarantine bucket should be private. Use an ACL or prefix-based policy to restrict access.

🔁 Flow

graph LR
A[User Upload] --> B[MinIO Quarantine Bucket]
B --> C[BullMQ Queue]
C --> D[Worker: virus scan, moderation, hash check]
D -->|Clean| E[Public Bucket + DB Update]
D -->|Inappropriate| F[Reject + Notify User]

🛡️ Step 2: Virus Scanning with ClamAV

Before processing any media, scan it for malware using the open-source antivirus engine ClamAV.

✅ Setup (Docker)

clamav:
  image: clamav/clamav:stable
  ports:
    - "3310:3310"

✅ Code Integration

import ClamScan from 'clamscan';

const clamscan = await new ClamScan().init();
const { isInfected, viruses } = await clamscan.scanBuffer(fileBuffer);

if (isInfected) throw new Error(`Infected file: ${viruses.join(', ')}`);

🔐 This prevents attackers from uploading files containing trojans or backdoors.

🧠 Step 3: ML-Based Content Classification

Use a deep learning model to detect inappropriate content: nudity, violence, hate, etc.

✅ Option A: Google Vision API (SafeSearch)

import vision from '@google-cloud/vision';

const client = new vision.ImageAnnotatorClient();
const [result] = await client.safeSearchDetection({ image: { content: buffer } });
const safe = result.safeSearchAnnotation;

if (safe.adult === 'VERY_LIKELY' || safe.violence === 'LIKELY') {
  throw new Error('Image flagged as unsafe');
}

Requires a GCP account + service account key file.

✅ Option B: Self-hosted (NudeNet / DeepStack)

Run it via Docker:

docker run -p 5000:5000 nudenet-server

Request:

const res = await fetch('http://localhost:5000/classify', {
  method: 'POST',
  body: formData,
});
const data = await res.json();

if (data.predictions.some(p => p.prob > 0.9)) {
  throw new Error('Image contains nudity');
}

🧬 Step 4: Known-Bad-Image Fingerprinting

For illegal content (e.g. CSAM), use image hashing to detect duplicates.

🧩 Perceptual Hashing (pHash / aHash / dHash)

These algorithms create “visual fingerprints” of images. Two visually similar images produce similar hashes—even if they’re resized, compressed, or recolored.

import { imageHash } from 'image-hash';

imageHash(buffer, 16, true, (error, hash) => {
  if (denylist.has(hash)) throw new Error('Image in blocklist');
});

🚨 PhotoDNA

Used for CSAM. Not open source. You can apply to Microsoft or use third-party moderation platforms like:

Microsoft Content Moderator
Thorn’s CSAM API (restricted access)

🧵 Step 5: Asynchronous Moderation Pipeline

You shouldn't block the user’s HTTP request while processing everything. Instead:

1. Accept the upload → push job

const job = await queue.add('moderate-image', { userId, buffer });
return res.status(202).json({ jobId: job.id });

2. Background worker handles it

const worker = new Worker('moderate-image', async (job) => {
  // virus scan
  // moderation
  // pHash
  // sharp → webp & resize
  // upload to MinIO
  // update DB
});

3. Track job status

const job = await queue.getJob(jobId);
return res.json({ status: job?.getState(), progress: job?.progress() });

🎨 Step 6: Image Transformation with Sharp

Resize + convert to WebP for fast delivery:

const sizes = [
  { name: 'thumb', w: 64, h: 64 },
  { name: 'medium', w: 256, h: 256 },
  { name: 'full', w: 1024, h: 1024 },
];

const processed = await Promise.all(sizes.map(({ name, w, h }) =>
  sharp(originalBuffer)
    .resize(w, h)
    .webp({ quality: 80 })
    .toBuffer()
));

🧠 Moderation Policy & Thresholds

Every product has different tolerance levels.

Label	Threshold	Action
Adult Nudity	>90%	Reject
Violence	>80%	Quarantine
Self-Harm	>70%	Manual Review

Store this in a config or DB so you can update without redeploying.

🧼 Cleanup + Fallbacks

If a job fails:

Remove uploaded objects
Set error fields in DB
Optionally alert admins

imageWorker.on('failed', async (job, err) => {
  // Remove objects from MinIO
  for (const f of job.data.uploads) {
    await minio.removeObject('avatars', f.name);
  }

  // Update DB
  await users.updateOne(
    { _id: job.data.userId },
    { $unset: { pendingAvatarJobId: "" }, $set: { avatarJobError: err.message } }
  );
});

📌 Recap

✅ Virus Scanning
✅ ML Content Classification (Cloud & Local)
✅ Known-bad Hash Detection
✅ Background Processing with BullMQ
✅ Web-Optimized Resizing
✅ Quarantine Workflow
✅ Policy-based Moderation
✅ User Feedback + Retry Handling

🏁 Final Words

What we’ve implemented here is an industry-standard pipeline that mirrors what platforms like Facebook, Discord, and Reddit use at scale. You:

Protect your users
Protect your platform’s reputation
Stay compliant with legal standards

Want to go further?

Add human moderation dashboards
Add appeal/review workflows
Add watermarking or steganographic IDs

Ali nazari @silentwatcher_95