Introduction
Have you ever wished you could automate the extraction of the best content from Hacker News and export it in structured formats like JSON for insights, newsletters, dashboards, or AI pipelines?
Well, now you can, with a powerful n8n automation template that combines:
- Hacker News search and curated lists (Today, Yesterday, Weekly, All Time)
- Bright Data’s Web Unlocker for dynamic content extraction.
- Google Gemini for transforming raw content into structured JSON.
Pre-requisite
- New users of Bright Data, please make sure to sign-up here - Bright Data
- Google Gemini. Please Sign up on Google AI Studio to get the API Key.
Why Bright Data?
Reliable Access to Complex Sites
Handles sites like LinkedIn, Amazon, and Google with built-in anti-bot evasion and CAPTCHA solving.Global IP Coverage
Offers rotating residential, datacenter, and mobile proxies from over 195 countries great for geo-specific scraping.API-First Design
Easy-to-integrate REST APIs that fit well into workflows, backend systems, or tools like n8n, Zapier, or Postman.Smart Unlocking with Web Unlocker
Automatically handles JavaScript rendering, redirects, cookies, and headers no need to reverse-engineer pages.Scalable & Production-Ready
Suitable for single profile scraping or scraping thousands of pages with built-in rate limiting and retry logic.Built for Stealth
Human-like behavior, session rotation, and browser fingerprinting evasion make it hard to detect.Great Logging & Dashboard
Provides detailed logs, usage stats, and error insights to monitor scraper performance.Enterprise-Level Security
GDPR & CCPA compliant with secure proxy handling and traffic encryption.Amazing Support
Offers 24/7 live support, dedicated success engineers, and well-documented SDKs.
Why This Template?
Hacker News is a goldmine for:
- Startup trends
- Developer tools
- AI & Tech debates
- Thoughtful commentaries
But extracting and making sense of HN content at scale is messy:
- No easy API for full content
- HTML parsing is inconsistent
- You want clean structured JSON, not HTML blobs
This n8n template solves that with structured LLM-powered extraction using Google Gemini.
What the Workflow Does
Input Options
- Fetch Hacker News front pages:
- Today
- Yesterday
- Weekly Top
- All-Time Top
- Perform custom Hacker News search
LLM-Powered Data Structuring
Once links are collected:
- Bright Data Web Unlocker extracts full page content
- Google Gemini is prompted to convert raw content into clean structured JSON with:
{
"$schema": "http://json-schema.org/schema#",
"title": "BestOfShowHNSearchResult",
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search term used (e.g., \"artificial-intelligence\")"
},
"totalResults": {
"type": "integer",
"description": "Total number of matching items"
},
"page": {
"type": "integer",
"description": "Current page number"
},
"perPage": {
"type": "integer",
"description": "Number of items per page"
},
"results": {
"type": "array",
"items": {
"type": "object",
"properties": {
"id": {
"type": "string",
"description": "Unique identifier (e.g., HN item ID)"
},
"title": {
"type": "string",
"description": "Headline or title of the post"
},
"url": {
"type": ["string", "null"],
"format": "uri",
"description": "Original URL or null for internal HN posts"
},
"commentsUrl": {
"type": "string",
"format": "uri",
"description": "Link to the Hacker News comments page"
},
"points": {
"type": "integer",
"description": "Number of points (votes) received"
},
"commentsCount": {
"type": "integer",
"description": "Number of comments"
},
"submitter": {
"type": "string",
"description": "Username of the person who submitted"
},
"submittedAt": {
"type": "string",
"format": "date-time",
"description": "Timestamp of submission"
},
"tags": {
"type": "array",
"items": { "type": "string" },
"description": "Any tags or categories applied"
}
},
"required": [
"id",
"title",
"commentsUrl",
"points",
"commentsCount",
"submitter",
"submittedAt"
]
}
}
},
"required": ["query", "totalResults", "page", "perPage", "results"]
}
Perfect for use in:
- Weekly tech reports
- SEO blog generation
- AI prompt databases
- RSS/Notion pipelines
n8n Template Highlights
Flexible HackerNews Data Extraction
- Handles a flexible way of HackerNews data extraction by today, yesterday, weekly, all.
Flexible Search Filters
- Plug in your own search queries (e.g., “AI”, “Prompt Engineering”) and let n8n do the digging.
Web Unlocking with Bright Data
- Handles page blocks, JS-heavy content, and hidden comment threads.
Export Options
- Google Sheets
How to Use This Template
- Import the template into your n8n instance
- Add your:
- Bright Data credentials
- Gemini API Key
- Set the action:
- search
- today, yesterday, week, all
- Hit execute
Your structured JSON will be ready to integrate wherever you need it.
Use Case Ideas
- Build a daily/weekly newsletter from top tech stories
- Feed into a personal trend dashboard
- Create a searchable knowledge base of HN content
- Train your own AI agent on top community ideas
Try It Now
You can get the n8n template from my GitHub:
Best Of Hacker News Structured Data Extract & Export with Google Gemini
Final Thoughts
This template shows the true power of AI + Automation + Unlocking the Web.
- Google Gemini adds semantic understanding.
- Bright Data enables full access.
- n8n makes it plug-and-play.