Blocking bots in Nginx
Horacio Degiorgi

Horacio Degiorgi @horaciodegiorgi

About: docker, development, libraries, php, laravel, postgresql

Location:
Argentina
Joined:
Aug 31, 2019

Blocking bots in Nginx

Publish Date: Feb 26
2 1

At bibliotecas.uncuyo.edu.ar we have multiple services running behind a reverse proxy based on nginx.
For days now all systems have been slowing down. Analyzing the usage logs we have found a massive increase in "visits" from AI bots.
How do we block them?
Using rules in the definition of the proxy_hosts

if ($http_user_agent ~* "amazonbot|Claudebot|claudebot|DataForSeoBot|dataforseobot|Amazonbot|SemrushBot|Semrush|AhrefsBot|MJ12bot|YandexBot|YandexImages|MegaIndex.ru|BLEXbot|BLEXBot|ZoominfoBot|YaK|VelenPublicWebCrawler|SentiBot|Vagabondo|SEOkicks|SEOkicks-Robot|mtbot/1.1.0i|SeznamBot|DotBot|Cliqzbot|coccocbot|python|Scrap|SiteCheck-sitecrawl|MauiBot|Java|GumGum|Clickagy|AspiegelBot|Yandex|TkBot|CCBot|Qwantify|MBCrawler|serpstatbot|AwarioSmartBot|Semantici|ScholarBot|proximic|MojeekBot|GrapeshotCrawler|IAScrawler|linkdexbot|contxbot|PlurkBot|PaperLiBot|BomboraBot|Leikibot|weborama-fetcher|NTENTbot|Screaming Frog SEO Spider|admantx-usaspb|Eyeotabot|VoluumDSP-content-bot|SirdataBot|adbeat_bot|TTD-Content|admantx|Nimbostratus-Bot|Mail.RU_Bot|Quantcastboti|Onespot-ScraperBot|Taboolabot|Baidu|Jobboerse|VoilaBot|Sogou|Jyxobot|Exabot|ZGrab|Proximi|Sosospider|Accoona|aiHitBot|Genieo|BecomeBot|ConveraCrawler|NerdyBot|OutclicksBot|findlinks|JikeSpider|Gigabot|CatchBot|Huaweisymantecspider|Offline Explorer|SiteSnagger|TeleportPro|WebCopier|WebReaper|WebStripper|WebZIP|Xaldon_WebSpider|BackDoorBot|AITCSRoboti|Arachnophilia|BackRub|BlowFishi|perl|CherryPicker|CyberSpyder|EmailCollector|Foobot|GetURL|httplib|HTTrack|LinkScan|Openbot|Snooper|SuperBot|URLSpiderPro|MAZBot|EchoboxBot|SerendeputyBot|LivelapBot|linkfluence.com|TweetmemeBot|LinkisBot|CrowdTanglebot") { return 403; }
Enter fullscreen mode Exit fullscreen mode

In our case, since we use proxymanager to manage the different domains, the entry of this configuration is done in the advanced section

advanced conf in proxymanager

Comments 1 total

  • MUHAMMED YAZEEN AN
    MUHAMMED YAZEEN ANMar 2, 2025

    Great article! Blocking bots using User-Agent strings is a good starting point, and you've explained it really well.

    I just wanted to add that User-Agent blocking can sometimes be bypassed since the User-Agent header can be easily spoofed. To make bot blocking more robust, we could combine it with other techniques like:

    • Rate limiting: Restrict the number of requests a client can make in a short time.
    • IP blocking: Block known malicious IPs or ranges.
    • Behavior-based detection: Identify bots by analyzing unusual patterns like high request rates, skipping resources, or accessing non-existent pages.
    • JavaScript challenges: Verify if the client can execute JavaScript, as most bots cannot.
    • CAPTCHAs: Add a CAPTCHA to sensitive areas like login pages or forms. -** Advanced abilities**: Services like Cloudflare or AWS WAF can provide more comprehensive bot protection. Combining these techniques can help create a stronger defense against bots. Thanks again for sharing this—it’s a great resource for anyone looking to get started!
Add comment