Reddit Scraper AI Agent: ThreadLens
Agastya Khati

Agastya Khati @agastya_khati_f72c89077c8

Location:
utopia
Joined:
Aug 16, 2025

Reddit Scraper AI Agent: ThreadLens

Publish Date: Aug 31
72 4

This is a submission for the AI Agents Challenge powered by n8n and Bright Data

What I Built

AI-driven automation that transforms any public Reddit thread into a concise, actionable insight eliminating hours of manual reading and ensuring teams never miss key points hidden in long discussions.

What Problem It Solves
Researching Reddit discussions for product feedback, hardware recommendations, or emerging trends can take 30–60 minutes per thread. ThreadLens automatically scrapes every comment and reply, analyzes the conversation with a large language model, and delivers a focused summary in under five minutes. This speeds up decision-making, ensures consistency across research, and makes insights easily searchable in our Notion knowledge base.

Demo

https://youtu.be/4TgBGNGRn5o

n8n Workflow

https://gist.github.com/kris70lesgo/c03f64c8b5decb7f7f6da49aeff5e529

Technical Implementation

  • n8n Chat Trigger: Captures user input (thread URL) via Telegram or web.
  • Bright Data Verified Node (Web Scraper): Batches all thread URLs into a configured “Web Scraper” recipe, handles proxy rotation, fingerprinting, and CAPTCHAs, then monitors and downloads the snapshot content.
  • n8n SplitInBatches & IF Nodes: Controls polling loop for snapshot readiness.
  • n8n Code & Aggregate Nodes: Reshapes the JSON output into a single text payload.
  • n8n AI Agent Node: Sends the combined comment text to GPT-3.5-turbo and retrieves the summary.
  • n8n Notion Node: Appends the final summary to a shared Notion database.

Bright Data Verified Node

The Bright Data node powers all scraping and unblocking:
Initiate Batch Extraction: Sends an array of Reddit comment page URLs to a pre-configured Web Scraper recipe.
Monitor Progress: Polls Bright Data’s API until the snapshot is “ready.”
Download Snapshot Content: Retrieves a structured JSON of all comments and metadata, with no HTML parsing on our end.

Journey

Building ThreadLens taught us to handle asynchronous batch APIs and transform their outputs for AI consumption.

Challenge: Managing the polling loop solved by combining IF, Wait, and SplitInBatches nodes to avoid infinite hangs.

Lesson: Aggregating arrays into a single prompt payload reduces repeated API calls and keeps costs down.

Outcome: A robust, end-to-end agent that any team member can trigger via chat, delivering vetted insights in under five minutes.

Comments 4 total

  • Robert Johanson
    Robert JohansonSep 4, 2025

    great workflow

  • Kanishk
    KanishkSep 7, 2025

    i found it quite useful

  • TristanLynn
    TristanLynnSep 7, 2025

    ThreadLens sounds like a smart way to turn Reddit’s noisy threads into actionable insights—especially if it’s doing robust deduping, topic clustering, and quick summaries while staying within API limits and community norms. I’d love to see it layer in entity extraction for brands/products, simple sentiment over time, and a “signal vs. hype” score so you can spot patterns without chasing every spike. Used ethically, that kind of lens could be gold for campaign planning—imagine mapping real user pain points and wishlists to shape Black Friday Tarjouskey deals in near-real time. How are you handling rate limits and compliance today—official API, cached snapshots, or a hybrid pipeline?

  • Adwaith Jayasankar
    Adwaith JayasankarSep 11, 2025

    Wow looks like exactly the same thing I made 2 days before this submission! Guess only I was able to figure out the clean notion formatting 😅

    dev.to/kichuman28/unstoppable-redd...

Add comment