Introduction
Tech Stack
Features
Setting Up the Next.js Application
Installing Required Packages
Setting Up Redis Connection
Configuring BullMQ Queue
Next.js Instrumentation Setup
Setting up Bright Data's Scraping Browser
- What is Bright Data's scraping browser?
- Steps to Set Up Bright Data's Scraping Browser
Implementing the Scraping Logic with Puppeteer
Flight Search Feature
Displaying Flight Search Results
Discover the Complete Guide and Codebase
- Watch the Detailed Explanation on YouTube
- Explore the Full Code on GitHub
Conclusion

Introduction

In the ever-evolving landscape of web development, the ability to efficiently gather, process, and display data from external sources has become increasingly valuable. Whether for market research, competitive analysis, or customer insights, web scraping plays a crucial role in unlocking the vast potential of the internet's data.

This blog post introduces a comprehensive guide to building a robust Next.js application designed for scraping flight data from Kayak, one of the leading travel search engines. By leveraging the power of Next.js alongside modern technologies such as BullMQ, Redis, and Puppeteer.

Tech Stack

Features

🚀 Next.js 14 App Directory with Tailwind CSS - Experience the sleek and modern UI powered by the latest Next.js 14 and styled with Tailwind CSS for that perfect look and feel.
🔗 API Routes & Server Actions - Dive into seamless backend integration with Next.js 14's API routes and server actions ensuring efficient data handling and server-side logic execution.
🕷 Scraping with Puppeteer Redis and BullMQ - Harness the power of Puppeteer for advanced web scraping with Redis and BullMQ managing queues and jobs for robust backend operations.
🔑 JWT Tokens for Authentication and Authorization - Secure your app with JWT tokens providing a reliable method for authentication and authorization across your platform.
💳 Stripe for Payment Gateways - Integrate Stripe for seamless payment processing enabling secure and easy transactions for booking trips flights and hotels.
✈️ Book Trips Flights and Hotels with Stripe Payment Gateway - Make your travel booking experience effortless with our Stripe-powered payment system.
📊 Scrape Live Data from Multiple Websites - Stay ahead with real-time data scraping from multiple sources keeping your app updated with the latest information.
💾 Store the Scraped Data in PostgreSQL with Prisma - Leverage PostgreSQL and Prisma for efficient storage and management of your scraped data ensuring reliability and speed.
🔄 Zustand for State Management - Enjoy smooth and manageable state management in your app with Zustand simplifying state logic and enhancing performance.
😈 Best Feature of the App - Scraping the Unscrapable Data with Bright Data's Scraping Browser.

Bright Data's Scraping Browser provides us with an automatic captcha-solving feature that helps us scrape the un-scrapable data.

Step 1: Setting Up the Next.js Application

Create a Next.js App: Start by creating a new Next.js app if you haven't already. You can do this by running the following command in your terminal:

npx create-next-app@latest booking-app

Navigate to Your App Directory: Change into your newly created app directory:

cd booking-app

Step 2: Installing Required Packages

You'll need to install several packages, including Redis, BullMQ, and Puppeteer Core. Run the following command to install them:

npm install ioredis bullmq puppeteer-core

ioredis is a robust Redis client for Node.js, enabling communication with Redis.
bullmq manages job and message queues with Redis as the backend.
puppeteer-core allows you to control an external browser for scraping purposes.

Step 3: Setting Up Redis Connection

Create a file (e.g., redis.js) in a suitable directory (e.g., lib/) to configure the Redis connection:

// lib/redis.js
import Redis from 'ioredis';

// Use REDIS_URL from environment or fallback to localhost
const REDIS_URL = process.env.REDIS_URL || 'redis://localhost:6379';
const connection = new Redis(REDIS_URL);

export { connection };

Step 4: Configuring BullMQ Queue

Set up the BullMQ queue by creating another file (e.g., queue.js) in the same directory as your Redis configuration:

// lib/queue.js
import { Queue } from 'bullmq';
import { connection } from './redis';

export const importQueue = new Queue('importQueue', {
  connection,
  defaultJobOptions: {
    attempts: 2,
    backoff: {
      type: 'exponential',
      delay: 5000,
    },
  },
});

Step 5: Next.js Instrumentation Setup

Next.js allows instrumentation, which can be enabled in your Next.js configuration. You'll also need to create a worker file for job processing.

1.Enable Instrumentation in Next.js: Add the following to your next.config.js to enable instrumentation:

// next.config.js
module.exports = {
  experimental: {
    instrumentationHook: true,
  },
};

2.Create a Worker for Job Processing: In your application, create a file (instrumentation.js) to handle job processing. This worker will use Puppeteer for scraping tasks:

// instrumentation.js
export const register = async () => {
  if (process.env.NEXT_RUNTIME === 'nodejs') {
    const { Worker } = await import('bullmq');
    const puppeteer = await import('puppeteer-core');
    const { connection } = await import('./lib/redis');
    const { importQueue } = await import('./lib/queue');

    new Worker('importQueue', async (job) => {
      // Job processing logic with Puppeteer goes here
    }, {
      connection,
      concurrency: 10,
      removeOnComplete: { count: 1000 },
      removeOnFail: { count: 5000 },
    });
  }
};

Step 6: Setting up Bright Data's Scraping Browser

Before setting up Bright Data Scraping Browser let's talk about what a scraping browser is.

What is Bright Data's scraping browser?

Bright Data's Scraping Browser is a cutting-edge tool for automated web scraping, designed to seamlessly integrate with Puppeteer, Playwright, and Selenium. It offers a suite of website unblocking features, including proxy rotation, CAPTCHA solving, and more, to enhance scraping efficiency. Ideal for complex web scraping requiring interactions, it allows scalability by hosting unlimited browser sessions on Bright Data’s infrastructure. For more details, visit Bright Data.

Step 1: Navigate to Bright Data's Website

Begin by heading over to Brightdata.com. This is your gateway to accessing the wealth of web scraping resources and tools offered by Bright Data.

Step 2: Create an Account

Once you're on Bright Data's website, sign up to create a new account. You'll be prompted to enter essential information to get your account up and running.

Step 3: Select Your Product

On the product selection page, look for the Proxies & Scraping Infrastructure product. This product is specifically designed to meet your web scraping needs, offering powerful tools and features for data extraction.

Step 4: Add a New Proxy

Within the Proxies & Scraping Infrastructure page, you'll find an "add new button." Click on this to start the process of adding a new scraping browser to your toolkit.

Step 5: Choose the Scraping Browser

A dropdown list will appear, from which you should select the scraping browser option. This tells Bright Data that you intend to set up a new scraping browser environment.

Step 6: Name Your Scraping Browser

Give your new scraping browser a unique name. This helps in identifying and managing it later, especially if you plan to use multiple browsers for different scraping projects.

Step 7: Add the Browser

After naming your browser, click on the "add" button. This action finalizes the creation of your new scraping browser.

Step 8: View Your Scraping Browser Details

Upon adding your scraping browser, you will be directed to a page where you can see all the details of your newly created scraping browser. This information is crucial for integration and use.

Step 9: Access Code and Integration Examples

Look for the "check out code and integration examples" button. Clicking this will provide you with a comprehensive view of how to integrate and use your scraping browser across multiple programming languages and libraries. This resource is invaluable for developers looking to customize their scraping setup.

Step 10: Integrate Your Scraping Browser

Finally, copy the SRS_WS_ENDPOINT variable. This is a critical piece of information that you will need to integrate into your source code, allowing your applications to communicate with the scraping browser you've just set up.

By following these detailed steps, you have successfully created a scraping browser within Bright Data's platform, ready to tackle your web scraping tasks. Remember, Bright Data offers extensive documentation and support to help you maximize your scraping projects' efficiency and effectiveness. Whether you're gathering market intelligence, conducting research, or monitoring competitive landscapes, your newly set up scraping browser is a powerful tool in your data collection arsenal.

Step 7: Implementing the Scraping Logic with Puppeteer

Continuing from where we left off in setting up our Next.js application for scraping flight data, the next critical step is to implement the actual scraping logic. This process involves utilizing Puppeteer to connect to a browser instance, navigate to the target URL (in our case, Kayak), and scrape the necessary flight data. The code snippet provided outlines a sophisticated method for achieving this goal, seamlessly integrating with our previously established BullMQ worker setup. Let's break down the components of this scraping logic and understand how it fits into our application.

Establishing a Connection to the Browser

The first step in our scraping process is to establish a connection to the browser through Puppeteer. This is accomplished by utilizing the puppeteer.connect method, which connects to an existing browser instance using a WebSocket endpoint (SBR_WS_ENDPOINT). This environment variable should be set to the WebSocket URL of the scraping browser service you're using, such as Bright Data:

const browser = await puppeteer.connect({
  browserWSEndpoint: SBR_WS_ENDPOINT,
});

Opening a New Page and Navigating to the Target URL

Once connected, we create a new page in the browser and navigate to the target URL specified in the job data. This URL is the specific Kayak search result page from which we intend to scrape flight data:

const page = await browser.newPage();
await page.goto(job.data.url);

Scraping Flight Data

The core of our logic lies in scraping the flight data from the page. We achieve this by using page.evaluate, a Puppeteer method that allows us to run scripts in the context of the browser. Within this script, we wait for the necessary elements to load and then proceed to collect flight information:

Flight Selector: We target elements with the class .nrc6-wrapper, which contain flight details.
Data Extraction: For each flight element, we extract details such as the airline logo, departure and arrival times, flight duration, airline name, and price. The departure and arrival times are cleaned to remove unnecessary numeric values at the end, ensuring we capture the time accurately.
Price Processing: The price is extracted as an integer after removing all non-numeric characters, ensuring it can be used for numerical operations or comparisons.

The extracted data is structured into an array of flight objects, each containing the details mentioned above:

const scrappedFlights = await page.evaluate(async () => {
  // Data extraction logic
  const flights = [];
  // Process each flight element
  // ...
  return flights;
});

Error Handling and Cleanup

Our scraping logic is wrapped in a try-catch block to handle any potential errors gracefully during the scraping process. Regardless of the outcome, we ensure the browser is closed properly in the finally block, maintaining resource efficiency and preventing potential memory leaks:

try {
  // Scraping logic
} catch (error) {
  console.log({ error });
} finally {
  await browser.close();
  console.log("Browser closed successfully.");
}

The entire code

const SBR_WS_ENDPOINT = process.env.SBR_WS_ENDPOINT;

export const register = async () => {

  if (process.env.NEXT_RUNTIME === "nodejs") {


    const { Worker } = await import("bullmq");
    const puppeteer = await import("puppeteer");
    const { connection } = await import("./lib/redis");
    const { importQueue } = await import("./lib/queue");

    new Worker(
      "importQueue",
      async (job) => {
        const browser = await puppeteer.connect({
          browserWSEndpoint: SBR_WS_ENDPOINT,
        });

        try {
          const page = await browser.newPage();

          console.log("in flight scraping");
          console.log("Connected! Navigating to " + job.data.url);
          await page.goto(job.data.url);
          console.log("Navigated! Scraping page content...");
          const scrappedFlights = await page.evaluate(async () => {
            await new Promise((resolve) => setTimeout(resolve, 5000));

            const flights = [];

            const flightSelectors = document.querySelectorAll(".nrc6-wrapper");

            flightSelectors.forEach((flightElement) => {
              const airlineLogo = flightElement.querySelector("img")?.src || "";
              const [rawDepartureTime, rawArrivalTime] = (
                flightElement.querySelector(".vmXl")?.innerText || ""
              ).split(" – ");

              // Function to extract time and remove numeric values at the end
              const extractTime = (rawTime: string): string => {
                const timeWithoutNumbers = rawTime
                  .replace(/[0-9+\s]+$/, "")
                  .trim();
                return timeWithoutNumbers;
              };

              const departureTime = extractTime(rawDepartureTime);
              const arrivalTime = extractTime(rawArrivalTime);
              const flightDuration = (
                flightElement.querySelector(".xdW8")?.children[0]?.innerText ||
                ""
              ).trim();

              const airlineName = (
                flightElement.querySelector(".VY2U")?.children[1]?.innerText ||
                ""
              ).trim();

              // Extract price
              const price = parseInt(
                (
                  flightElement.querySelector(".f8F1-price-text")?.innerText ||
                  ""
                )
                  .replace(/[^\d]/g, "")
                  .trim(),
                10
              );

              flights.push({
                airlineLogo,
                departureTime,
                arrivalTime,
                flightDuration,
                airlineName,
                price,
              });
            });

            return flights;
          });
        } catch (error) {
          console.log({ error });
        } finally {
          await browser.close();
          console.log("Browser closed successfully.");
        }
      },
      {
        connection,
        concurrency: 10,
        removeOnComplete: { count: 1000 },
        removeOnFail: { count: 5000 },
      }
    );
  }
};

Step 8: Flight Search Feature

Building upon our flight data scraping functionality, let's integrate a comprehensive flight search feature into our Next.js application. This feature will provide users with a dynamic interface to search for flights by specifying the source, destination, and date. Leveraging the powerful Next.js framework alongside a modern UI library and state management, we create an engaging and responsive flight search experience.

Key Components of the Flight Search Feature

Dynamic City Selection: The feature includes an autocomplete functionality for source and destination inputs, powered by a pre-defined list of city-airport codes. As users type, the application filters and displays matching cities, enhancing the user experience by making it easier to find and select airports.
Date Selection: Users can select their intended flight date through a date input, providing flexibility in planning their travel.
Scraping Status Monitoring: After initiating a scraping job, the application monitors the job's status through periodic API calls. This asynchronous checking allows the app to update the UI with the status of the scraping process, ensuring users are informed of the progress and results.

Complete Code for the Flight Search Component

"use client";
import { useAppStore } from "@/store";
import { USER_API_ROUTES } from "@/utils/api-routes";
import { cityAirportCode } from "@/utils/city-airport-codes";
import { Button, Input, Listbox, ListboxItem } from "@nextui-org/react";
import axios from "axios";
import Image from "next/image";
import { useRouter } from "next/navigation";
import React, { useEffect, useRef, useState } from "react";
import { FaCalendarAlt, FaSearch } from "react-icons/fa";

const SearchFlights = () => {
  const router = useRouter();
  const { setScraping, setScrapingType, setScrappedFlights } = useAppStore();
  const [loadingJobId, setLoadingJobId] = useState<number | undefined>(undefined);
  const [source, setSource] = useState("");
  const [sourceOptions, setSourceOptions] = useState<
    { city: string; code: string; }[]
  >([]);
  const [destination, setDestination] = useState("");
  const [destinationOptions, setDestinationOptions] = useState<
    { city: string; code: string; }[]
  >([]);
  const [flightDate, setFlightDate] = useState(() => {
    const today = new Date();
    return today.toISOString().split("T")[0];
  });

  const handleSourceChange = (query: string) => {
    const matchingCities = Object.entries(cityAirportCode)
      .filter(([, city]) => city.toLowerCase().includes(query.toLowerCase()))
      .map(([code, city]) => ({ code, city }))
      .splice(0, 5);

    setSourceOptions(matchingCities);
  };

  const destinationChange = (query: string) => {
    const matchingCities = Object.entries(cityAirportCode)
      .filter(([, city]) => city.toLowerCase().includes(query.toLowerCase()))
      .map(([code, city]) => ({ code, city }))
      .splice(0, 5);

    setDestinationOptions(matchingCities);
  };

  const startScraping = async () => {
    if (source && destination && flightDate) {
      const data = await axios.get(`${USER_API_ROUTES.FLIGHT_SCRAPE}?source=${source}&destination=${destination}&date=${flightDate}`);
      if (data.data.id) {
        setLoadingJobId(data.data.id);
        setScraping(true);
        setScrapingType("flight");
      }
    }
  };

  useEffect(() => {
    if (loadingJobId) {
      const checkIfJobCompleted = async () => {
        try {
          const response = await axios.get(`${USER_API_ROUTES.FLIGHT_SCRAPE_STATUS}?jobId=${loadingJobId}`);
          if (response.data.status) {
            set

ScrappedFlights(response.data.flights);
            clearInterval(jobIntervalRef.current);
            setScraping(false);
            setScrapingType(undefined);
            router.push(`/flights?data=${flightDate}`);
          }
        } catch (error) {
          console.log(error);
        }
      };
      jobIntervalRef.current = setInterval(checkIfJobCompleted, 3000);
    }

    return () => clearInterval(jobIntervalRef.current);
  }, [loadingJobId]);

  return (
    <div className="h-[90vh] flex items-center justify-center">
      <div className="absolute left-0 top-0 h-[100vh] w-[100vw] max-w-[100vw] overflow-hidden overflow-x-hidden">
        <Image src="/flight-search.png" fill alt="Search" />
      </div>
      <div className="absolute h-[50vh] w-[60vw] flex flex-col gap-5">
        {/* UI and functionality for flight search */}
      </div>
    </div>
  );
};

export default SearchFlights;

Step 9: Flight Search Page UI

Displaying Flight Search Results

After successfully scraping flight data, the next crucial step is to present these results to the users in a user-friendly manner. The Flights component in your Next.js application is designed for this purpose.

"use client";

import { useAppStore } from "@/store";
import { USER_API_ROUTES } from "@/utils/api-routes";
import { Button } from "@nextui-org/react";
import axios from "axios";
import Image from "next/image";
import { useRouter, useSearchParams } from "next/navigation";
import React from "react";
import { FaChevronLeft } from "react-icons/fa";
import { MdOutlineFlight } from "react-icons/md";

const Flights = () => {
  const router = useRouter();
  const searchParams = useSearchParams();
  const date = searchParams.get("date");
  const { scrappedFlights, userInfo } = useAppStore();
  const getRandomNumber = () => Math.floor(Math.random() * 41);

  const bookFLight = async (flightId: number) => {};

  return (
    <div className="m-10 px-[20vw] min-h-[80vh]">
      <Button
        className="my-5"
        variant="shadow"
        color="primary"
        size="lg"
        onClick={() => router.push("/search-flights")}
      >
        <FaChevronLeft /> Go Back
      </Button>
      <div className="flex-col flex gap-5">
        {scrappedFlights.length === 0 && (
          <div className="flex items-center justify-center py-5 px-10 mt-10 rounded-lg text-red-500 bg-red-100 font-medium">
            No Flights found.
          </div>
        )}
        {scrappedFlights.map((flight: any) => {
          const seatsLeft = getRandomNumber();
          return (
            <div
              key={flight.id}
              className="grid grid-cols-12 border bg-gray-200 rounded-xl font-medium drop-shadow-md"
            >
              <div className="col-span-9 bg-white rounded-l-xl p-10 flex flex-col gap-5">
                <div className="grid grid-cols-4 gap-4">
                  <div className="flex flex-col gap-3 font-medium">
                    <div>
                      <div className="relative w-20 h-16">
                        <Image src={flight.logo} alt="airline name" fill />
                      </div>
                    </div>
                    <div>{flight.name}</div>
                  </div>
                  <div className="col-span-3 flex justify-between">
                    <div className="flex flex-col gap-2">
                      <div className="text-blue-600">From</div>
                      <div>
                        <span className="text-3xl">
                          <strong>{flight.departureTime}</strong>
                        </span>
                      </div>
                      <div>{flight.from}</div>
                    </div>
                    <div className="flex flex-col items-center justify-center gap-2">
                      <div className="bg-violet-100 w-max p-3 text-4xl text-blue-600 rounded-full">
                        <MdOutlineFlight />
                      </div>
                      <div>
                        <span className="text-lg">
                          <strong>Non-stop</strong>
                        </span>
                      </div>
                      <div>{flight.duration}</div>
                    </div>

                    <div className="flex flex-col gap-2">
                      <div className="text-blue-600">To</div>
                      <div>
                        <span className="text-3xl">
                          <strong>{flight.arrivalTime}</strong>
                        </span>
                      </div>
                      <div>{flight.to}</div>
                    </div>
                  </div>
                </div>
                <div className="flex justify-center gap-10 bg-violet-100 p-3 rounded-lg">
                  <div className="flex">
                    <span>Airplane&nbsp;&nbsp;</span>
                    <span className="text-blue-600 font-semibold">
                      Boeing 787
                    </span>
                  </div>
                  <div className="flex">
                    <span>Travel Class:&nbsp;&nbsp;</span>
                    <span className="text-blue-600 font-semibold">Economy</span>
                  </div>
                </div>
                <div className="flex justify-between font-medium">
                  <div>
                    Refundable <span className="text-blue-600"> $5 ecash</span>
                  </div>
                  <div
                    className={`${
                      seatsLeft > 20 ? "text-green-500" : "text-red-500"
                    }`}
                  >
                    Only {seatsLeft} Seats Left
                  </div>
                  <div className="cursor-pointer">Flight Details</div>
                </div>
              </div>
              <div className="col-span-3 bg-violet-100 rounded-r-xl h-full flex flex-col items-center justify-center gap-5">
                <div>
                  <div>
                    <span className="line-through font-light">
                      ${flight.price + 140}
                    </span>
                  </div>
                  <div className="flex items-center gap-2">
                    <span className="text-5xl font-bold">${flight.price}</span>
                    <span className="text-blue-600">20% OFF</span>
                  </div>
                </div>
                <Button
                  variant="ghost"
                  radius="full"
                  size="lg"
                  color="primary"
                  onClick={() => {
                    if (userInfo) bookFLight(flight.id);
                  }}
                >
                  {userInfo ? "Book Now" : "Login to Book"}
                </Button>
              </div>
            </div>
          );
        })}
      </div>
    </div>
  );
};

export default Flights;

Flight Search Results

Discover the Complete Guide and Codebase

The sections and code snippets shared above represent just a fraction of the full functionality and code necessary to build a robust flight data scraping and search application using Next.js. To grasp the entirety of this project, including advanced features, optimizations, and best practices, I invite you to dive deeper through my comprehensive resources available online.

Watch the Detailed Explanation on YouTube

For a step-by-step video guide that walks you through the development process, coding nuances, and functionality of this application, check out my YouTube video. This tutorial is designed to provide you with a deeper understanding of the concepts, allowing you to follow along at your own pace and gain valuable insights into Next.js application development.

Explore the Full Code on GitHub

If you're eager to explore the code in its entirety, head over to my GitHub repository. There, you'll find the complete codebase, including all the components, utilities, and setup instructions you need to get this application running on your own machine.

koolkishan / nextjs-travel-planner

Arklyte Travel Planner

Travel Planner App with Live Web Scraping from various sources using Bright Data scraping browser.

Project Screenshots:

🧐 Features

Here are some of the project's best features:

🚀 Next.js 14 App Directory with Tailwind CSS - Experience the sleek and modern UI powered by the latest Next.js 14 and styled with Tailwind CSS for that perfect look and feel.
🔗 API Routes & Server Actions - Dive into seamless backend integration with Next.js 14's API routes and server actions ensuring efficient data handling and server-side logic execution.
🕷 Scraping with Puppeteer Redis and BullMQ - Harness the power of Puppeteer for advanced web scraping with Redis and BullMQ managing queues and jobs for robust backend operations.
🔑 JWT Tokens for Authentication and Authorization - Secure your app with JWT tokens providing a reliable method for authentication and authorization across your platform.
💳 Stripe for Payment Gateways -…

View on GitHub

Conclusion

Building a comprehensive application like the flight data scraping and search tool with Next.js showcases the power and versatility of modern web development tools and frameworks. Whether you're a seasoned developer looking to refine your skills or a beginner eager to dive into web development, these resources are tailored to support your journey. Watch the detailed tutorial on YouTube, explore the full code on GitHub, and join the conversation to enhance your development expertise and contribute to the vibrant developer community.

Kishan Sheth @kishansheth

Next.js 14 Booking App with Live Data Scraping using Scraping Browser

Table of Contents