This is a submission for the Bright Data AI Web Access Hackathon
We built NewsPulse AI to explore a simple but powerful question:
"What if you could instantly see how different media outlets spin the same story?"
In a world flooded with headlines, bias, and misinformation, NewsPulse AI acts like an AI-powered research assistant. You ask a question, and it fetches, scrapes, analyzes, and visualizes fresh news articles in real-time—just like a human researcher, but supercharged.
🚀 What It Does
- 🔍 Enter any news-related query (e.g. farmers protest, AI in education, etc.)
- 🌐 NewsPulse fetches real-time articles via Bright Data’s MCP scraping infrastructure
- 🧠 LangChain & GPT-3.5 process each article for:
- Sentiment (positive/neutral/negative)
- Bias
- Political lean
- Toxicity and propaganda presence
- 📊 Get aggregate insights and transparent logs instantly
What I Built
NewsPulse AI is an intelligent real-time news analysis engine that allows users to query any news-related topic and instantly receive a stream of analyzed articles. It mimics how a human researcher might discover, navigate, extract, and interact with web content—but does it entirely autonomously.
The project solves the problem of understanding media bias, misinformation, and sentiment across different news sources in real-time. Whether a user wants to explore how different platforms cover a political event, assess the emotional tone of news about a public figure, or analyze the presence of propaganda or toxicity in media, NewsPulse AI provides deep insights in seconds.
🔧 Deep Integration with FastMCP via STDIO
🔥 One of the key differentiators of our solution is that we have directly integrated Bright Data’s official MCP source code with our Node.js backend using FastMCP over standard input/output (STDIO).
We are:
- Running the FastMCP server inside our Express (Node.js) environment
- Communicating with it through STDIO (stdin/stdout)
- Launching and managing scraping methods programmatically from Node.js
This level of integration is not just plug-and-play—it required fine-tuning STDIO communication and handling input/output streams carefully. But it makes our backend much more flexible and efficient, enabling real-time task execution without relying on HTTP or RPC overhead.
It also aligns perfectly with Bright Data’s vision for real-time AI agents interacting with the open web, giving us full control over tool orchestration, logging, and performance.
🛠️ Tech Stack & Architecture
Our project is built entirely on modern, open-source tools optimized for real-time web data extraction and analysis:
- Frontend: React + Vite
- Backend: Node.js + Express
- AI Orchestration: LangChain + OpenAI GPT-3.5 Turbo
- Scraping Infrastructure: Bright Data MCP (FastMCP Server)
- Communication: REST APIs + WebSocket (for real-time logs & results)
- Deployment: Hosted on an EC2 instance with persistent STDIO-based communication
⚡ No Database Used:
We do not persist any data. Everything — from scraping to analysis — happens live from the open web. This ensures:
- Real-time results for every query
- No stale or outdated information
- Transparent validation that scraping works on-demand
- Compliance with the hackathon’s goal of showcasing live data access
This stateless, no-DB approach proves the reliability of Bright Data’s infrastructure in powering real-time AI agents without any dependency on pre-stored content.
Demo
Video Demo
📁 GitHub Repo: https://github.com/sumankalia/news-pulse-ai
🌐 Live Site:
Frontend: http://ec2-16-170-239-65.eu-north-1.compute.amazonaws.com:5173/
Backend: http://ec2-16-170-239-65.eu-north-1.compute.amazonaws.com:4002/api/articles/ping
📸 Screenshots:
🔍 Search Input Interface – Users can enter any news-related query, such as “farmers protest” or “AI in education,” to fetch real-time news articles and analyze them instantly.
⚙️ Real-Time Query Processing – The system uses Bright Data’s MCP server to fetch and analyze fresh articles from Indian news sources. Logs show live scraping and analysis updates for transparency and debugging.
📰 Scraped Article Snapshot – A detailed preview of an individual article showing title, source, timestamp, and extracted summary. Each result is processed for bias, sentiment, and lean.
📊 Sentiment, Bias, and Political Lean Breakdown – Each article undergoes NLP-based analysis to categorize tone (positive/neutral/negative), detect media bias, and predict political inclination.
📈 Aggregate Insights Dashboard – Provides a summary of all fetched articles, highlighting sentiment distribution, bias frequency, and lean trends to help users quickly assess media coverage patterns.
How I Used Bright Data's Infrastructure
This project is powered by Bright Data’s MCP infrastructure, particularly the FastMCP server, which enables our AI agent to simulate human browsing behavior and extract structured information in real time. Here’s how the four key actions were implemented:
1. Discover
We use LangChain with a custom PromptTemplate to dynamically route user queries to one of four scraping methods:
-
scrape_as_article
: Direct article URLs -
scrape_a_homepage
: Homepage or latest headline queries -
search_via_google
: Informational and broad-topic queries -
search_via_bing
: Dynamic, JS-heavy search result pages
switch (result.method) {
case "scrape_as_article":
data = await runToolCall("scrape_as_article", { url: query });
break;
case "scrape_a_homepage":
data = await runToolCall("scrape_a_homepage", {
url: result?.homepageUrl,
});
break;
case "search_via_google":
data = await runToolCall("search_via_google", { query });
break;
case "search_via_bing":
processedResults = await searchViaBing({
query,
userId,
});
}
LangChain decides the optimal route based on user intent, and our backend follows through using the selected method.
2. Access
We leverage Bright Data’s FastMCP to access dynamic and protected web pages like news homepages or search results.
Here’s what happens:
- First, we load the target page (e.g., a news homepage or Bing/Google search results).
- We then scrape all the article links visible on that page.
- Next, we go article-by-article, scraping each one individually for full content and metadata.
- This method helps us analyze multiple perspectives from a single page, all without hitting stale data or relying on pre-saved content.
This smart multi-link scraping approach powers our real-time insights across 50+ articles per query.
// Step 1: Navigate to the target page (homepage or search results)
await runToolCall("scraping_browser_navigate", {
url: "https://www.newswebsite.com/",
});
// Step 2: Wait for articles to load on the page
await runToolCall("scraping_browser_wait_for", {
selector: "article a[href]",
timeout: 10000,
});
// Step 3: Extract all article links
const linksResult = await runToolCall("scraping_browser_links", {});
// Step 4: Iterate through each link and trigger detailed article scraping
for (const link of linksResult.links) {
await runToolCall("scrape_as_article", { url: link });
}
3. Extract
We extract detailed article metadata, including:
• title, content, url, published date, author, source, and image
Bright Data supports extraction via:
- ✅ rawHtml – full HTML of the scraped content
- ✅ markdown – clean, AI-friendly summary format
server.addTool({
name: "scrape_a_homepage",
description:
"Scrape a single webpage URL with advanced options for " +
"content extraction and get back the results in MarkDown language. " +
"This tool can unlock any webpage even if it uses bot detection or " +
"CAPTCHA.",
parameters: z.object({ url: z.string().url() }),
execute: tool_fn("scrape_a_homepage", async ({ url }) => {
let response = await axios({
url: "https://api.brightdata.com/request",
method: "POST",
data: {
url,
zone: unlocker_zone,
format: "raw",
data_format: "markdown",
},
headers: api_headers(),
responseType: "text",
});
return response.data;
}),
});
We utilized a combination of custom and prebuilt functions to clean the raw data and extract the necessary information for analysis.
4. Interact
This is where Bright Data’s full capabilities come to life. In the search_via_bing method, we simulate a full browser interaction flow using MCP tools:
- Navigate to Bing.com
- Wait for the page to load
- Clear the input field
- Enter the search text
- Press Enter
- Wait ~5 seconds
- Wait for a result-related HTML selector to appear
- Scrape result links using scraping_browser_links
This closely mimics human behavior, allowing us to pull data from otherwise inaccessible or JS-heavy websites.
const url = "https://www.bing.com/news";
const searchSelector = "input#sb_form_q";
const searchText = query;
const processedResults = [];
// Navigate to the webpage using scraping_browser_navigate
await runToolCall("scraping_browser_navigate", {
url,
});
await runToolCall("scraping_browser_wait_for", {
selector: searchSelector,
timeout: 10000,
});
// Clear the search field first using scraping_browser_type
await runToolCall("scraping_browser_type", {
selector: searchSelector,
text: "",
submit: false,
});
// Type the search text using scraping_browser_type
await runToolCall("scraping_browser_type", {
selector: searchSelector,
text: searchText,
submit: false,
});
// Press Enter to submit the search
await runToolCall("scraping_browser_press", {
key: "Enter",
});
// Wait for the search results to load
// First wait for network to be idle
await new Promise((resolve) => setTimeout(resolve, 3000));
// Wait for search results container
await runToolCall("scraping_browser_wait_for", {
selector: 'a.linkBtn[aria-label="Best match"]',
timeout: 50000,
});
// Additional wait for search results to be fully loaded
await new Promise((resolve) => setTimeout(resolve, 2000));
// Wait for article links to be present
await runToolCall("scraping_browser_wait_for", {
selector: "article a[href*='/articles/']",
timeout: 5000,
});
// Get all links from the page using scraping_browser_links
const linksResult = await runToolCall("scraping_browser_links", {});
Performance Improvements
Before integrating Bright Data’s MCP server, our initial architecture relied on:
- Sequential proxy rotation using Bright Data’s Residential, Web Unlocker, and Mobile proxies
- Headless browser automation via Puppeteer
- Custom code for region-based rotation and JavaScript-rendered scraping
While functional, this method had significant drawbacks:
- High latency per article (6–12 seconds)
- Increased code complexity and maintenance overhead
- Failures on JS-heavy or protected pages
🚀 Transition to Bright Data MCP (FastMCP Server)
By switching to the Fast MCP server, we achieved:
- ⏱ ~80% reduction in scraping latency per article
- 💡 Seamless access to protected and JavaScript-heavy sites with zero-code browser interaction
- ⚙️ Lightweight, declarative scraping powered by STDIO communication between our Node.js backend and MCP server
- 📦 Simplified architecture (less boilerplate, no Puppeteer or manual proxy handling)
- ✅ Greater reliability across countries and site types — with support for region-specific scraping
We are now scraping 50+ articles across multiple countries in real time without bottlenecks or rate-limiting issues.
You can even compare our old Puppeteer-based proxy project here:
👉 Legacy Puppeteer Proxy Scraper https://inspiring-taffy-5808f5.netlify.app/
And see how Bright Data MCP gave us a 10x better development and performance experience.
Future Improvements
We built NewsPulse AI to meet hackathon goals with a real-time, stateless architecture. For the next phase, we plan to:
- Integrate a Vector DB (like Pinecone or Qdrant) to enable semantic search and avoid redundant scraping.
- Add a Scalable Job Queue for handling spikes using tools like BullMQ or Redis.
- Implement Auth & API Keys to support user-specific usage and rate-limiting.
- Secure Secrets Properly, moving all credentials to secret managers for a real deployment.
- Improve UX with features like sentiment trend visualizations, historical comparisons, and saved analyses.
These updates will make the platform more robust, scalable, and production-ready.
Final Notes
NewsPulse AI showcases how powerful AI agents become when paired with open, real-time, structured web data. We didn’t just build a tool—we built a thinking system that mimics human research patterns at internet speed.
Lovingly crafted by Suman and his wife Sarita. 💫
🙌 Shoutout
Big thanks to the team at Bright Data! Loved integrating your MCP platform.
If you're reading this and found Bright Data useful, give their repo some love:
🌟 https://github.com/luminati-io/brightdata-mcp
Been cool seeing steady progress - it adds up. What do you think actually keeps things growing over time? Habits? Luck? Just showing up?