Best of Hacker News: Structured Data Extract & Export using Google Gemini + n8n + Bright Data
Ranjan Dailata

Ranjan Dailata @ranjancse

About: A Professional based out of India specialized in handling AI-powered automations. Contact me at ranjancse@gmail.com, LinkedIn - https://www.linkedin.com/in/ranjan-dailata/

Joined:
Nov 16, 2023

Best of Hacker News: Structured Data Extract & Export using Google Gemini + n8n + Bright Data

Publish Date: Jun 22
0 0

Introduction

Best Of Hacker News Structured Data Extract & Export with Google Gemini n8n

Have you ever wished you could automate the extraction of the best content from Hacker News and export it in structured formats like JSON for insights, newsletters, dashboards, or AI pipelines?

Well, now you can, with a powerful n8n automation template that combines:

  • Hacker News search and curated lists (Today, Yesterday, Weekly, All Time)
  • Bright Data’s Web Unlocker for dynamic content extraction.
  • Google Gemini for transforming raw content into structured JSON.

Pre-requisite

  1. New users of Bright Data, please make sure to sign-up here - Bright Data
  2. Google Gemini. Please Sign up on Google AI Studio to get the API Key.

Why Bright Data?

  • Reliable Access to Complex Sites
    Handles sites like LinkedIn, Amazon, and Google with built-in anti-bot evasion and CAPTCHA solving.

  • Global IP Coverage
    Offers rotating residential, datacenter, and mobile proxies from over 195 countries great for geo-specific scraping.

  • API-First Design
    Easy-to-integrate REST APIs that fit well into workflows, backend systems, or tools like n8n, Zapier, or Postman.

  • Smart Unlocking with Web Unlocker
    Automatically handles JavaScript rendering, redirects, cookies, and headers no need to reverse-engineer pages.

  • Scalable & Production-Ready
    Suitable for single profile scraping or scraping thousands of pages with built-in rate limiting and retry logic.

  • Built for Stealth
    Human-like behavior, session rotation, and browser fingerprinting evasion make it hard to detect.

  • Great Logging & Dashboard
    Provides detailed logs, usage stats, and error insights to monitor scraper performance.

  • Enterprise-Level Security
    GDPR & CCPA compliant with secure proxy handling and traffic encryption.

  • Amazing Support
    Offers 24/7 live support, dedicated success engineers, and well-documented SDKs.


Why This Template?

Hacker News is a goldmine for:

  • Startup trends
  • Developer tools
  • AI & Tech debates
  • Thoughtful commentaries

But extracting and making sense of HN content at scale is messy:

  • No easy API for full content
  • HTML parsing is inconsistent
  • You want clean structured JSON, not HTML blobs

This n8n template solves that with structured LLM-powered extraction using Google Gemini.


What the Workflow Does

Input Options

  • Fetch Hacker News front pages:
    • Today
    • Yesterday
    • Weekly Top
    • All-Time Top
  • Perform custom Hacker News search

LLM-Powered Data Structuring

Once links are collected:

  • Bright Data Web Unlocker extracts full page content
  • Google Gemini is prompted to convert raw content into clean structured JSON with:
  {
  "$schema": "http://json-schema.org/schema#",
  "title": "BestOfShowHNSearchResult",
  "type": "object",
  "properties": {
    "query": {
      "type": "string",
      "description": "The search term used (e.g., \"artificial-intelligence\")"
    },
    "totalResults": {
      "type": "integer",
      "description": "Total number of matching items"
    },
    "page": {
      "type": "integer",
      "description": "Current page number"
    },
    "perPage": {
      "type": "integer",
      "description": "Number of items per page"
    },
    "results": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "id": {
            "type": "string",
            "description": "Unique identifier (e.g., HN item ID)"
          },
          "title": {
            "type": "string",
            "description": "Headline or title of the post"
          },
          "url": {
            "type": ["string", "null"],
            "format": "uri",
            "description": "Original URL or null for internal HN posts"
          },
          "commentsUrl": {
            "type": "string",
            "format": "uri",
            "description": "Link to the Hacker News comments page"
          },
          "points": {
            "type": "integer",
            "description": "Number of points (votes) received"
          },
          "commentsCount": {
            "type": "integer",
            "description": "Number of comments"
          },
          "submitter": {
            "type": "string",
            "description": "Username of the person who submitted"
          },
          "submittedAt": {
            "type": "string",
            "format": "date-time",
            "description": "Timestamp of submission"
          },
          "tags": {
            "type": "array",
            "items": { "type": "string" },
            "description": "Any tags or categories applied"
          }
        },
        "required": [
          "id",
          "title",
          "commentsUrl",
          "points",
          "commentsCount",
          "submitter",
          "submittedAt"
        ]
      }
    }
  },
  "required": ["query", "totalResults", "page", "perPage", "results"]
}
Enter fullscreen mode Exit fullscreen mode

Perfect for use in:

  • Weekly tech reports
  • SEO blog generation
  • AI prompt databases
  • RSS/Notion pipelines

n8n Template Highlights

Flexible HackerNews Data Extraction

  • Handles a flexible way of HackerNews data extraction by today, yesterday, weekly, all.

Flexible Search Filters

  • Plug in your own search queries (e.g., “AI”, “Prompt Engineering”) and let n8n do the digging.

Web Unlocking with Bright Data

  • Handles page blocks, JS-heavy content, and hidden comment threads.

Export Options

  • Google Sheets

How to Use This Template

  1. Import the template into your n8n instance
  2. Add your:
    • Bright Data credentials
    • Gemini API Key
  3. Set the action:
    • search
    • today, yesterday, week, all
  4. Hit execute

Your structured JSON will be ready to integrate wherever you need it.


Use Case Ideas

  • Build a daily/weekly newsletter from top tech stories
  • Feed into a personal trend dashboard
  • Create a searchable knowledge base of HN content
  • Train your own AI agent on top community ideas

Try It Now

You can get the n8n template from my GitHub:

Best Of Hacker News Structured Data Extract & Export with Google Gemini


Final Thoughts

This template shows the true power of AI + Automation + Unlocking the Web.

  • Google Gemini adds semantic understanding.
  • Bright Data enables full access.
  • n8n makes it plug-and-play.

Comments 0 total

    Add comment