Practical Puppeteer: Get Instagram account profile detail
Sony AK

Sony AK @sonyarianto

About: Member of Technical Staff

Location:
Jakarta
Joined:
Nov 20, 2017

Practical Puppeteer: Get Instagram account profile detail

Publish Date: Dec 13 '19
64 15

Today scraping with Puppeteer will be related to Instagram. The scenario is we go to an Instagram profile and we will get some data from there, such as:

  • Check username exists or not
  • Username
  • Verified account or not
  • Private account or not
  • Account name
  • Bio description
  • Account profile picture URL
  • Bio URL display
  • Total posts, total followers, total followings
  • Recent posts (an array that contains URL to post and its thumbnail image)

As usual, we will use Puppeteer (not using any API). Puppeteer is a Node library which provides a high-level API to control Chrome or Chromium over the DevTools Protocol. Puppeteer runs headless by default, but can be configured to run full (non-headless) Chrome or Chromium. Please go to https://pptr.dev for more detail.

Let's start.

Preparation

Install Puppeteer

npm i puppeteer
Enter fullscreen mode Exit fullscreen mode

The code

This code will get the detail public profile of Instagram account @cristiano, yes, it's Cristiano Ronaldo account.

File instagram_account_profile.js

const puppeteer = require('puppeteer');

(async () => {
    // set some options (set headless to false so we can see 
    // this automated browsing experience)
    let launchOptions = { headless: false, args: ['--start-maximized'] };

    const browser = await puppeteer.launch(launchOptions);
    const page = await browser.newPage();

    // set viewport and user agent (just in case for nice viewing)
    await page.setViewport({width: 1366, height: 768});
    await page.setUserAgent('Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36');

    // go to Instagram web profile (this example use Cristiano Ronaldo profile)
    await page.goto('https://instagram.com/cristiano');

    // check username exists or not exists
    let isUsernameNotFound = await page.evaluate(() => {
        // check selector exists
        if(document.getElementsByTagName('h2')[0]) {
            // check selector text content
            if(document.getElementsByTagName('h2')[0].textContent == "Sorry, this page isn't available.") {
                return true;
            }
        }
    });

    if(isUsernameNotFound) {
        console.log('Account not exists!');

        // close browser
        await browser.close();
        return;
    }

    // get username
    let username = await page.evaluate(() => {
        return document.querySelectorAll('header > section h1')[0].textContent;
    });

    // check the account is verified or not
    let isVerifiedAccount = await page.evaluate(() => {
        // check selector exists
        if(document.getElementsByClassName('coreSpriteVerifiedBadge')[0]) {
            return true;
        } else {
            return false;
        }
    });

    // get username picture URL
    let usernamePictureUrl = await page.evaluate(() => {
        return document.querySelectorAll('header img')[0].getAttribute('src');
    });

    // get number of total posts
    let postsCount = await page.evaluate(() => {
        return document.querySelectorAll('header > section > ul > li span')[0].textContent.replace(/\,/g, '');
    });

    // get number of total followers
    let followersCount = await page.evaluate(() => {
        return document.querySelectorAll('header > section > ul > li span')[1].getAttribute('title').replace(/\,/g, '');
    });

    // get number of total followings
    let followingsCount = await page.evaluate(() => {
        return document.querySelectorAll('header > section > ul > li span')[2].textContent.replace(/\,/g, '');
    });

    // get bio name
    let name = await page.evaluate(() => {
        // check selector exists
        if(document.querySelectorAll('header > section h1')[1]) {
            return document.querySelectorAll('header > section h1')[1].textContent;
        } else {
            return '';
        }
    });

    // get bio description
    let bio = await page.evaluate(() => {
        if(document.querySelectorAll('header h1')[1].parentNode.querySelectorAll('span')[0]) {
            return document.querySelectorAll('header h1')[1].parentNode.querySelectorAll('span')[0].textContent;
        } else {
            return '';
        }
    });

    // get bio URL
    let bioUrl = await page.evaluate(() => {
        // check selector exists
        if(document.querySelectorAll('header > section div > a')[1]) {
            return document.querySelectorAll('header > section div > a')[1].getAttribute('href');
        } else {
            return '';
        }
    });

    // get bio display
    let bioUrlDisplay = await page.evaluate(() => {
        // check selector exists
        if(document.querySelectorAll('header > section div > a')[1]) {
            return document.querySelectorAll('header > section div > a')[1].textContent;
        } else {
            return '';
        }
    });

    // check if account is private or not
    let isPrivateAccount = await page.evaluate(() => {
        // check selector exists
        if(document.getElementsByTagName('h2')[0]) {
            // check selector text content
            if(document.getElementsByTagName('h2')[0].textContent == 'This Account is Private') {
                return true;
            } else {
                return false;
            }
        } else {
            return false;
        }
    });

    // get recent posts (array of url and photo)
    let recentPosts = await page.evaluate(() => {
        let results = [];

        // loop on recent posts selector
        document.querySelectorAll('div[style*="flex-direction"] div > a').forEach((el) => {
            // init the post object (for recent posts)
            let post = {};

            // fill the post object with URL and photo data
            post.url = 'https://www.instagram.com' + el.getAttribute('href');
            post.photo = el.querySelector('img').getAttribute('src');

            // add the object to results array (by push operation)
            results.push(post);
        });

        // recentPosts will contains data from results
        return results;
    });

    // display the result to console
    console.log({'username': username,
                 'is_verified_account': isVerifiedAccount,
                 'username_picture_url': usernamePictureUrl,
                 'posts_count': postsCount,
                 'followers_count': followersCount,
                 'followings_count': followingsCount,
                 'name': name,
                 'bio': bio,
                 'bio_url': bioUrl,
                 'bio_url_display': bioUrlDisplay,
                 'is_private_account': isPrivateAccount,
                 'recent_posts': recentPosts});

    // close the browser
    await browser.close();
})();
Enter fullscreen mode Exit fullscreen mode

I set the headless mode to false in Puppeteer options, so we can see the browser in action.

Run it

node instagram_account_profile.js
Enter fullscreen mode Exit fullscreen mode

If everything OK, it will show the data structure like below on the console.

{
  username: 'cristiano',
  is_verified_account: true,
  username_picture_url: 'https://instagram.fcgk18-1.fna.fbcdn.net/v/t51.2885-19/s150x150/67310557_649773548849427_4130659181743046656_n.jpg?_nc_ht=instagram.fcgk18-1.fna.fbcdn.net&_nc_cat=1&oh=6fbc3118da5962a82e5733d14c93a93a&oe=5E70CF2D',
  posts_count: '2716',
  followers_count: '192798306',
  followings_count: '445',
  name: 'Cristiano Ronaldo',
  bio: '',
  bio_url: 'https://l.instagram.com/?u=http%3A%2F%2Fwww.cristianoronaldo.com%2F&e=ATMsBNjqh3vJtV6jZ68Jo1e8yXmGpacPHE4dfv_mSRg-PrcHYdCYZFkWxDuYLzORB-M3_aVb',
  bio_url_display: 'www.cristianoronaldo.com',
  is_private_account: false,
  recent_posts: [
    {
      url: 'https://www.instagram.com/p/B58x9BUATxb/',
      photo: 'https://instagram.fcgk18-1.fna.fbcdn.net/v/t51.2885-15/sh0.08/e35/c220.0.792.792a/s640x640/76876296_179193059941409_6221002990564880736_n.jpg?_nc_ht=instagram.fcgk18-1.fna.fbcdn.net&_nc_cat=1&oh=07ae6ecd5089fc1e5838ef86970c1f8c&oe=5E8023DF'
    },
    {
      url: 'https://www.instagram.com/p/B55gk8DAL3Z/',
      photo: 'https://instagram.fcgk18-1.fna.fbcdn.net/v/t51.2885-15/e35/c0.60.480.480a/75483286_186154695857472_4950353937543838253_n.jpg?_nc_ht=instagram.fcgk18-1.fna.fbcdn.net&_nc_cat=1&oh=cb3f7b242096ea16c3c4cc4b6312b87d&oe=5DF5B9F3'
    },
    {
      url: 'https://www.instagram.com/p/B5zzJtBAoan/',
      photo: 'https://instagram.fcgk18-1.fna.fbcdn.net/v/t51.2885-15/sh0.08/e35/c207.0.827.827a/s640x640/73393228_168482760903763_8963602282249975289_n.jpg?_nc_ht=instagram.fcgk18-1.fna.fbcdn.net&_nc_cat=1&oh=479cb033d8882b59fd6bbb4c6e1c408a&oe=5E80081A'
    },
    {
      url: 'https://www.instagram.com/p/B5vuHHAAodt/',
      photo: 'https://instagram.fcgk18-1.fna.fbcdn.net/v/t51.2885-15/sh0.08/e35/c240.0.960.960a/s640x640/74676914_139591227455800_1244894556711547199_n.jpg?_nc_ht=instagram.fcgk18-1.fna.fbcdn.net&_nc_cat=1&oh=c40bca7880742088d19a19ae382def7f&oe=5E81AB8C'
    },
    {
      url: 'https://www.instagram.com/p/B5qW56QIFFp/',
      photo: 'https://instagram.fcgk18-1.fna.fbcdn.net/v/t51.2885-15/sh0.08/e35/c213.0.853.853a/s640x640/72783037_1351521851696486_1891057812314322465_n.jpg?_nc_ht=instagram.fcgk18-1.fna.fbcdn.net&_nc_cat=1&oh=88a45933f962a91940e49ee24d5acb09&oe=5E6E2EDE'
    },
    {
      url: 'https://www.instagram.com/p/B5qICTmg7hS/',
      photo: 'https://instagram.fcgk18-1.fna.fbcdn.net/v/t51.2885-15/sh0.08/e35/c227.0.910.910a/s640x640/76944874_1768777216590413_4590633889755644385_n.jpg?_nc_ht=instagram.fcgk18-1.fna.fbcdn.net&_nc_cat=1&oh=e69a90e499a8797b5b0bc4c9d0be8889&oe=5E77027A'
    },
    {
      url: 'https://www.instagram.com/p/B5phLcCAfWV/',
      photo: 'https://instagram.fcgk18-1.fna.fbcdn.net/v/t51.2885-15/sh0.08/e35/c106.0.868.868a/s640x640/74711305_126116271783000_2660929486246111795_n.jpg?_nc_ht=instagram.fcgk18-1.fna.fbcdn.net&_nc_cat=1&oh=dce9f4e0c396491c8b4750f946acb043&oe=5E84A9B8'
    },
    {
      url: 'https://www.instagram.com/p/B5nqI98g9jq/',
      photo: 'https://instagram.fcgk18-1.fna.fbcdn.net/v/t51.2885-15/sh0.08/e35/c0.180.1440.1440a/s640x640/72295503_199047947810859_4327918090297549142_n.jpg?_nc_ht=instagram.fcgk18-1.fna.fbcdn.net&_nc_cat=1&oh=9083fc356fee2c6780424df45ae2bda5&oe=5E82CCA1'
    },
    {
      url: 'https://www.instagram.com/p/B5lpnXXgbiT/',
      photo: 'https://instagram.fcgk18-1.fna.fbcdn.net/v/t51.2885-15/sh0.08/e35/c0.161.1291.1291a/s640x640/74337451_200653047633832_6084933369944989223_n.jpg?_nc_ht=instagram.fcgk18-1.fna.fbcdn.net&_nc_cat=1&oh=0b5ceedb25781b4924565949937edc0b&oe=5EB1C0A1'
    },
    {
      url: 'https://www.instagram.com/p/B5iI4Sag0qQ/',
      photo: 'https://instagram.fcgk18-1.fna.fbcdn.net/v/t51.2885-15/sh0.08/e35/c177.0.710.710a/s640x640/73420511_1023531488000332_2506917797196221103_n.jpg?_nc_ht=instagram.fcgk18-1.fna.fbcdn.net&_nc_cat=1&oh=1312fb525a0bc8429e9181232d1d763f&oe=5E7156EB'
    },
    {
      url: 'https://www.instagram.com/p/B5dRx0zgeSb/',
      photo: 'https://instagram.fcgk18-1.fna.fbcdn.net/v/t51.2885-15/sh0.08/e35/s640x640/75299394_983315452036089_6040427267837814466_n.jpg?_nc_ht=instagram.fcgk18-1.fna.fbcdn.net&_nc_cat=1&oh=ea373d65404f9838cbbd777852445d12&oe=5DF5FDE7'
    },
    {
      url: 'https://www.instagram.com/p/B5az6Qfg3va/',
      photo: 'https://instagram.fcgk18-1.fna.fbcdn.net/v/t51.2885-15/sh0.08/e35/s640x640/73393267_185000869337693_7735852682111206915_n.jpg?_nc_ht=instagram.fcgk18-1.fna.fbcdn.net&_nc_cat=1&oh=17030ad8ad6d0453eed64c203167f359&oe=5E902F7F'
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Ow nice.

What we can learn from this code is using selector in page.evaluate and doing looping on page.evaluate.

This code also available at GitHub repository at https://github.com/sonyarianto/get-instagram-account-profile-detail-with-puppeteer

Updates

Dellean Santos (@tawsbob) on the comment told me that for Instagram public account profile, we can get the data from window._sharedData object. It's nice. You can get it by using Puppetter as well by using this page.evaluate.

let sharedData = await page.evaluate(() => {
  return window._sharedData.entry_data.ProfilePage[0].graphql.user;
});
Enter fullscreen mode Exit fullscreen mode

Thank you and I hope you enjoy it.

Reference

Comments 15 total

  • Frans Allen
    Frans AllenDec 13, 2019

    This is pretty cool actually, also published same stuff on my profile, feel free to check it out!

    • Sony AK
      Sony AKDec 14, 2019

      Thank you, I will check it out mas.

  • Dellean Santos
    Dellean SantosDec 13, 2019

    all data that you need you can get in _sharedData Object

    exemple

    • Sony AK
      Sony AKDec 14, 2019

      wow that's something new for me, thank you and I will and on the article later. Thank you.

  • Rohit Martires
    Rohit MartiresJul 3, 2020

    Hey when I try hosting this on heroku it doesn't work, any idea y?

  • Aman Panjwani
    Aman PanjwaniApr 5, 2021

    I built a similar Instagram-Scrapper, worked very fine both with headless:true and false on localmachine (macos) but when deployed on platforms like heroku or glitch, instagram in not rendering the needed divs.
    I tried several method, but none of them actually worked, even tried puppeteer-extra stealth plugin to bypass headless:true issues with some site.
    Any clue how to solve this issue ?

  • Sosdani
    SosdaniApr 21, 2021

    hello my friend, is possible get all posts of user?

    • Aman Panjwani
      Aman PanjwaniApr 21, 2021

      @sosdanix , I made a self-project which does the similar work.
      Check it out - github.com/apanjwani0/Scrape-Insta...

      • Sosdani
        SosdaniApr 21, 2021

        Great Code my friend, I tried to do the same something, but it is difficult to work with the instagram api

        • Aman Panjwani
          Aman PanjwaniApr 21, 2021

          Thanks :)
          Yes, that's why I decided to fetch all posts by scrapping (only public profiles can be fetched this way)
          I had an idea for private, but it will require the user to give their instagram credentials to my app, (which I think won't be appreciated) even after that user can only fetch posts of profile which are available to them.

  • ekko14
    ekko14Aug 22, 2022

    Can you add how to post phots or video on instagram using puppeteer at specified time? I tried to do it but m stuck on when instagram asks to select photos or videos.

  • anahale23
    anahale23Jan 11, 2024

    Consistently produce captivating, high-quality content that captivates and engages your audience. Prioritize visually stunning imagery, videos, and graphics, employing professional photography techniques, editing tools, and creativity to produce aesthetically pleasing content. Diversify your content mix, incorporating various formats, including photos, videos, stories, reels, IGTV, and carousel posts, to cater to diverse audience follow instagram preferences and consumption habits.

Add comment