Reddit Scraper AI Agent: ThreadLens

This is a submission for the AI Agents Challenge powered by n8n and Bright Data

What I Built

AI-driven automation that transforms any public Reddit thread into a concise, actionable insight eliminating hours of manual reading and ensuring teams never miss key points hidden in long discussions.

What Problem It Solves
Researching Reddit discussions for product feedback, hardware recommendations, or emerging trends can take 30–60 minutes per thread. ThreadLens automatically scrapes every comment and reply, analyzes the conversation with a large language model, and delivers a focused summary in under five minutes. This speeds up decision-making, ensures consistency across research, and makes insights easily searchable in our Notion knowledge base.

Demo

https://youtu.be/4TgBGNGRn5o

n8n Workflow

https://gist.github.com/kris70lesgo/c03f64c8b5decb7f7f6da49aeff5e529

Technical Implementation

n8n Chat Trigger: Captures user input (thread URL) via Telegram or web.
Bright Data Verified Node (Web Scraper): Batches all thread URLs into a configured “Web Scraper” recipe, handles proxy rotation, fingerprinting, and CAPTCHAs, then monitors and downloads the snapshot content.
n8n SplitInBatches & IF Nodes: Controls polling loop for snapshot readiness.
n8n Code & Aggregate Nodes: Reshapes the JSON output into a single text payload.
n8n AI Agent Node: Sends the combined comment text to GPT-3.5-turbo and retrieves the summary.
n8n Notion Node: Appends the final summary to a shared Notion database.

Bright Data Verified Node

The Bright Data node powers all scraping and unblocking:
Initiate Batch Extraction: Sends an array of Reddit comment page URLs to a pre-configured Web Scraper recipe.
Monitor Progress: Polls Bright Data’s API until the snapshot is “ready.”
Download Snapshot Content: Retrieves a structured JSON of all comments and metadata, with no HTML parsing on our end.

Journey

Building ThreadLens taught us to handle asynchronous batch APIs and transform their outputs for AI consumption.

Challenge: Managing the polling loop solved by combining IF, Wait, and SplitInBatches nodes to avoid infinite hangs.

Lesson: Aggregating arrays into a single prompt payload reduces repeated API calls and keeps costs down.

Outcome: A robust, end-to-end agent that any team member can trigger via chat, delivering vetted insights in under five minutes.

Agastya Khati @agastya_khati_f72c89077c8