Over the past weeks I’ve been monitoring traffic from AI crawlers like OpenAI’s GPTBot, oai-searchbot and ClaudeBot. The data (see screenshots below) raises some interesting questions:
Questions I Wanted to Answer
Why does GPTBot visit
robots.txt
so many times, sometimes multiple times per day?Why does
GPTBot
preferrobots.txt
oversitemap.xml
?Why do I see AI bot traffic but no crawling of fresh content? Just repeated hits to old resources.
(Screenshot 1: Vercel Observability Query Builder: Bot traffic)
1. robots.txt
Obsession
(Screenshot 2: OpenAI GPTBot robots.txt traffic)
The charts clearly show GPTBot hammering robots.txt across multiple IPs, sometimes 7 times in 2 days from the same subnet. Unlike Googlebot, which fetches robots.txt
a few times per day and caches the rules, GPTBot seems to re-check every time it rotates IPs or restarts.
(Screenshot 3: OpenAI AI crawlers traffic pattern)
That means there’s no centralised “consent” store for the crawler. Every new instance behaves like a fresh bot, wasting its crawl budget on permission checks.
2. sitemap.xml
Inconsistencies
I’ve tracked two different projects, and the behaviour is inconsistent. On one site, GPTBot fetched the sitemap exactly once in a month. On another, it skipped the sitemap entirely but went straight for content. Meanwhile, Anthropic’s ClaudeBot actually hit the sitemap multiple times.
The missing piece here is a smart algorithm that keeps score over time for each website. Google solved this years ago: it doesn’t blindly trust every lastmod
tag, but instead builds a trust score for each domain based on history, accuracy, and freshness signals. That’s how it decides whether to treat a sitemap update seriously or to ignore it.
AI crawlers aren’t doing this yet. They either underuse sitemaps or waste fetches on them without consistency. To improve, AI labs need to adopt a similar scoring system. Or, as I strongly suspect from patterns I’ve seen, they may simply partner with Google Search and tap into its index instead of reinventing crawling from scratch.
Side note: I’ve even seen OpenAI API results that looked suspiciously close to Google Search outputs ...
(Screenshot 4: AI crawlers traffic pattern - Vercel Observability Query builder)
3. Crawling Old Content Repeatedly (and the Budget Problem)
This is where the inefficiency really shows. Bots keep returning to old content instead of discovering what’s new. Even when they’ve seen the sitemap, they often ignore it and waste their crawl budget revisiting stale pages.
There should be a smarter way to surface new material—and honestly, respecting lastmod
in sitemap.xml
would solve a lot of this. I really hope someone on the search teams at OpenAI and Anthropic is reading this.
From what I see:
Crawling budgets are tiny. Sometimes a bot “spends” its limited fetches just on
robots.txt
and pages it has already crawled.No centralised rule cache. Each IP acts independently, re-checking permissions and burning requests on duplicate work.
Unstable sessions. The pattern of repeated restarts suggests crawler instances spin up and down often, leading to wasted quota.
And that’s why your fresh blog post doesn’t get fetched, while your robots.txt
enjoys multiple visits per day.
4. The Static Assets Surprise (a.k.a. Bots Running Headless Browsers)
Now here’s the real surprise: OpenAI’s crawler sometimes downloads static assets. Next.js chunks, CSS, polyfills. That almost certainly means it’s firing up a headless browser and actually rendering the page. Rendering at scale is expensive, so seeing this in the logs is like catching the bot red-handed burning VC money on your webpack bundles.
Developers, let’s be honest: we shouldn’t force AI labs to reinvent Google Search’s rendering farm from scratch. The sane thing is still to serve content via SSR/ISR so crawlers don’t have to play Chromium roulette just to see your page. Otherwise you’re basically making Sam Altman pay more to crawl your vibe-coded portfolio site.
The funny bit? This is great news for vibe coders. All those sites built with pure CSR - the “AI slop” nobody thought would ever be indexable, might now actually get pulled into GPTBot’s memory. Your prayers have been heard... at least until the crawl budget runs out.
Fun fact: some vibe coding tools default to CSR, which is basically SEO assisted suicide. If you care about visibility, whether in Google or in AI engines, please stop.
(Screenshot 5: GPTBot download static assets (however hallucinated 404s)
5. What This Means for AI SEO
The good news:
OpenAI and Anthropic at least play by the rules. They ask permission before scraping, unlike the swarm of shady scrapers hitting your site daily.
The bad news:
- Crawl budgets are tiny and often wasted.
- Fresh content gets ignored.
- Sitemaps and lastmod aren’t respected.
- JS rendering happens only occasionally, so CSR-only sites are still at risk of being invisible.
Closing Thought
Google has a 25-year head start in crawling, indexing, and ranking. AI crawlers are still in year one of that journey. They’re not true search engines yet, but the scaffolding is going up fast.
If anyone from OpenAI, Anthropic, or xAI is reading this: please, implement smarter crawl budgets and start respecting sitemap freshness. Otherwise, all we’ll get is bots lovingly revisiting robots.txt while the real content sits untouched.
Godspeed
https://x.com/dom_sipowicz
https://www.linkedin.com/in/dominiksipowicz/