How to Scrape ZoomInfo’s Hidden JSON Data

Over 260 million contacts, 100 million company profiles, all in one place. That’s ZoomInfo — a treasure trove of B2B intelligence. But don’t be fooled. Snagging that data isn’t a walk in the park. ZoomInfo fights back hard with CAPTCHAs, fingerprinting, and brutal IP bans that kill most scrapers after just a few hits.
The good news is we’re going to show you exactly how to slice through those defenses and get clean, reliable data at scale in 2025 without losing your mind or your IP addresses.

What Can You Collect from ZoomInfo

ZoomInfo isn’t just a list. It’s a goldmine with layers of detail:

Company intelligence: Firmographics like revenue, employee count, SIC/NAICS codes, and parent-subsidiary hierarchies.
Contacts: Real, verified emails, direct phones, job titles, departments, seniority levels, LinkedIn URLs — all legit and up-to-date.
Technographics: Know their tech stack, cloud providers, and even organizational charts showing reporting lines.
Business insights: Real-time funding rounds, executive moves, intent signals, and confidence scores to sift the gold from the noise.

This data fuels market research, lead gen, competitive analysis, CRM enrichment, and custom dashboards.

The Secret Inside ZoomInfo Pages

Most scrapers dive headfirst into messy HTML — a rookie move. ZoomInfo hides its treasure in a neat JSON blob buried inside a <script> tag.
Open your browser DevTools, check the Network > Doc tab, and peek at the first HTML response. Look for a <script id="ng-state"> containing a clean, rich JSON object. This blob holds the full company profile, org charts, funding details, and more.
Scraping this JSON is faster, cleaner, and far less fragile than chasing scattered DOM elements.

Why Scraping ZoomInfo is Tough

ZoomInfo’s anti-bot arsenal is no joke:

IP bans: Too many requests too fast, and your IP’s toast. You’ll get 429 and 403 errors fast.
CAPTCHAs: They’re sneaky — think “Press & Hold” sliders designed to block bots.
Fingerprinting: ZoomInfo watches every browser signal — headers, JavaScript quirks, Canvas/WebGL fingerprints — to spot bots instantly.

Simple scripts won’t cut it. You need stealth and smarts.

How to Overcome Anti-Bot Barriers

Stealth headless browsers: Use Selenium with Undetected ChromeDriver, Puppeteer with Stealth Plugin, or Playwright Stealth. These tools mask automation signals, spoof Canvas fingerprints, and look like real browsers.
CAPTCHA-solving services: Integrate 2Captcha or Anti-Captcha. They use humans and AI to crack puzzles. Yes, it adds some delay and cost — but it works.
Rotating residential proxies: Your secret weapon. Residential IPs look like real users and aren’t blacklisted easily. Rotate them aggressively. Never send multiple requests from the same IP.

Configuring Your ZoomInfo Scraper

Ready to build? Here’s how to start with a simple scraper that:

Fetches ZoomInfo company pages.
Extracts the hidden JSON profile data.
Saves it clean and ready for analysis.

1. Environment Setup

Create a clean Python environment and install essentials:

python -m venv zoominfo-scraper
# Activate virtualenv
# Windows (CMD)
zoominfo-scraper\Scripts\activate
# macOS/Linux
source zoominfo-scraper/bin/activate

pip install requests beautifulsoup4 urllib3

2. Build a Basic Company Profile Scraper

Here’s the core logic: fetch, extract, save.

import json
import requests
from bs4 import BeautifulSoup
import urllib3
from urllib3.exceptions import InsecureRequestWarning

urllib3.disable_warnings(InsecureRequestWarning)

class ZoomInfoScraper:
    def __init__(self, url):
        self.url = url
        self.headers = {
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Language": "en-US,en;q=0.9",
            "Referer": url,
            "User-Agent": (
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/124.0.0.0 Safari/537.36"
            ),
        }
        self.proxies = self._setup_proxies()

    def _setup_proxies(self):
        username = "PROXY_USERNAME"
        password = "PROXY_PASSWORD"
        proxy_host = "gate.example.com:7000"

        if not username or not password:
            print("No proxy credentials found. Running without proxy.")
            return None

        proxy_url = f"http://{username}:{password}@{proxy_host}"
        return {"http": proxy_url, "https": proxy_url}

    def fetch_html(self):
        try:
            resp = requests.get(self.url, headers=self.headers, proxies=self.proxies, verify=False, timeout=15)
            resp.raise_for_status()
            return resp.text
        except requests.RequestException as e:
            raise Exception(f"Request failed: {e}")

    def extract_data(self, html):
        soup = BeautifulSoup(html, "html.parser")
        script_tag = soup.find("script", {"id": "ng-state", "type": "application/json"})
        if not script_tag:
            raise ValueError("Data script tag not found")
        return json.loads(script_tag.string).get("pageData")

    def run(self):
        print(f"Scraping {self.url.split('/')[-1]} ...")
        try:
            html = self.fetch_html()
            data = self.extract_data(html)
            if data:
                with open("page_data.json", "w") as f:
                    json.dump(data, f, indent=2)
                print("Success! Data saved to page_data.json")
                return data
            print("No page data found.")
        except Exception as e:
            print(f"Error: {e}")
        return None


if __name__ == "__main__":
    url = "https://www.zoominfo.com/c/anthropic-pbc/546195556"
    scraper = ZoomInfoScraper(url)
    scraper.run()

Run it, and you get detailed JSON like this:

{
  "companyId": "546195556",
  "name": "Anthropic",
  "url": "https://www.anthropic.com",
  "numberOfEmployees": "1035",
  "address": {
    "city": "San Francisco",
    "state": "California",
    "country": "United States"
  },
  "fundings": { ... },
  "competitors": [ ... ]
}

This data fuels lead generation, market analysis, and more.

Scraping Search Results and Competitor Links

Want more than one company? Start from ZoomInfo’s search with filters like industry or location. But heads up — free search results only let you page through the first 5 pages.
How to scale:

Loop through pages 1 to 5.
Extract company URLs from search results.
Feed each URL to your scraper.
Rotate user agents and proxies.
Add retries with exponential backoff.

Install these for robustness:

pip install tenacity fake-useragent

tenacity retries failed requests smartly.
fake-useragent randomizes User-Agent headers to avoid detection.

Also, use competitor links from each profile to discover new companies — a powerful recursive crawl.

Final Thoughts

Scraping ZoomInfo isn’t for the faint-hearted, but with rotating residential proxies, stealth browsers, CAPTCHA solvers, and smart retry logic, it’s absolutely doable. Automate effectively, keep your footprint low, and the data will flow smoothly.

Swiftproxy - Residential Proxies @swiftproxy_residential