Practical Puppeteer: How to evaluate XPath expression
Sony AK

Sony AK @sonyarianto

About: Member of Technical Staff

Location:
Jakarta
Joined:
Nov 20, 2017

Practical Puppeteer: How to evaluate XPath expression

Publish Date: Dec 18 '19
31 9

Today I will share about how to evaluate XPath expression in Puppeteer using $x API and in addition we will also use waitForXPath API.

Before I learn Puppeteer, I mostly use XPath on PHP through their DOMXPath class and I found it very useful for doing element selector things. I feel comfortable and easy when using XPath expression rather than using CSS selector, it's just my personal opinion, sorry :)

For those who don't know XPath, here is according to Wikipedia

XPath (XML Path Language) is a query language for selecting nodes from an XML document. In addition, XPath may be used to compute values (e.g., strings, numbers, or Boolean values) from the content of an XML document. XPath was defined by the World Wide Web Consortium (W3C).

In Puppeteer there are two API that related to XPath. One is waitForXPath that same like waitForSelector. The purpose is the same, it wait for element to appear based on our XPath expression. The second is $x method that useful for evaluating XPath expression. The $x will return array of ElementHandle and I will show you the sample later.

Stop the boring things. Let's start with a scenario. I have a website it's called Lamudi in Indonesia https://www.lamudi.co.id/newdevelopments/ and I want to get/scrape the value based on selector show below.

Alt Text

Our target is this selector. I want to get the 160 value.

<span class="CountTitle-number">160</span>
Enter fullscreen mode Exit fullscreen mode

Usually we can use CSS selector like document.querySelector('span[class="CountTitle-number"]') but alternatively now we are using XPath expression like this //span[@class="CountTitle-number"].

On Developer tools console we can get this selector easily. Try type this on Developer tools on your browser.

$x('//span[@class="CountTitle-number"]');  
Enter fullscreen mode Exit fullscreen mode

The image result is like below.

Alt Text

OK nice, now we already get the ElementHandle from that XPath expression. OK now let's create the script on that use Puppeteer to get this selector text content.

Preparation

npm i puppeteer
Enter fullscreen mode Exit fullscreen mode

The code

The code is self explanatory and I hope you can adjust, expand or improvise for your specific needs later.

File puppeteer_xpath.js

const puppeteer = require('puppeteer');

(async () => {
    // set some options (set headless to false so we can see 
    // this automated browsing experience)
    let launchOptions = { headless: false, args: ['--start-maximized'] };

    const browser = await puppeteer.launch(launchOptions);
    const page = await browser.newPage();

    // set viewport and user agent (just in case for nice viewing)
    await page.setViewport({width: 1366, height: 768});
    await page.setUserAgent('Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36');

    // go to the target web
    await page.goto('https://www.lamudi.co.id/newdevelopments/');

    // wait for element defined by XPath appear in page
    await page.waitForXPath("(//span[@class='CountTitle-number'])[1]");

    // evaluate XPath expression of the target selector (it return array of ElementHandle)
    let elHandle = await page.$x("(//span[@class='CountTitle-number'])[1]");

    // prepare to get the textContent of the selector above (use page.evaluate)
    let lamudiNewPropertyCount = await page.evaluate(el => el.textContent, elHandle[0]);

    console.log('Total Property Number is:', lamudiNewPropertyCount);

    // close the browser
    await browser.close();
})();
Enter fullscreen mode Exit fullscreen mode

Run it

node puppeteer_xpath.js
Enter fullscreen mode Exit fullscreen mode

If everything OK it will display the result like below.

Total Property Number is: 160
Enter fullscreen mode Exit fullscreen mode

Conclusion

I think Puppeteer support for XPath will be very useful for data scraping, since sometimes it's hard to write CSS selector for specific use case.

Thank you and I hope you enjoy it. See you again on next Practical Puppeteer series.

Source code of this sample is available on GitHub https://github.com/sonyarianto/xpath-on-puppeteer.git

Reference

Comments 9 total

  • David K Gurr
    David K GurrJun 2, 2021

    thanks, this was helpful

    • Sony AK
      Sony AKOct 10, 2021

      you are welcome :)

  • Raphael Schweikert
    Raphael SchweikertJul 26, 2021

    Thanks for this. I love XPath for these kinds of use-cases.
    Yes, CSS selectors can be simpler and well-understood but they are also restricted on purpose to have good run-time characteristics to not bog down the browser for dynamic updates.
    So there’s lots of things you can do with XPath that’s simply not possible with selectors (like finding text nodes or using axes to select up the tree instead of down.

    • Sony AK
      Sony AKOct 10, 2021

      totally agree with this, XPath to the rescue and full flexibility :)

  • ztesterparadise2
    ztesterparadise2Sep 23, 2022

    Thank you so much, good sir!
    Struggled to find so well arranged and simply put infromation for days

  • Alucian Corrêa
    Alucian CorrêaDec 26, 2022

    Thank you sir.

    But if the XPATH does not exist, is it possible to fix this? To tract that... Can you help me?

    Like

    if doenst exists do this

    If exista do that

    Thank you.

  • Tommy
    TommyFeb 26, 2023

    You actually don't need this line:

    let elHandle = await page.$x("(//span[@class='CountTitle-number'])[1]");
    
    Enter fullscreen mode Exit fullscreen mode

    More concise:

    const element = await page.waitForXPath("(//span[@class='CountTitle-number'])[1]");
    const lamudiNewPropertyCount = await page.evaluate(el => el.textContent, element);
    
    Enter fullscreen mode Exit fullscreen mode
Add comment