How to block AI bots with Nginx

After migrating my code off of GitHub, it appears that OpenAI's web crawler found my links to where I moved my code and started crawling. I caught it early on and quickly blocked it by IP but I realized that I need to do more.

While most people here are probably reading through the gemini protocol there are some, like me, who dual hosts or proxies their content over http. For those of you using Nginx, here is a simple way to block all the AI bots. It has been reported that these bots have completely ignored the `robots.txt` file so I've moved to blocking them based on user agent. Again, they can lie abouthis but as of the moment beyond direct IP blocking and adding in some sort of crawl detection, this is the easiest and most effective way to block them.

Nginx Configuration

First I created a snippet that contains a map of all the user agents I would like to block:

~$ cat /etc/nginx/snippets/bad-bot.conf 

map $http_user_agent $block_ai_bot {
    default 0;
    ~*Amazonbot 1;
    ~*Applebot-Extended 1;
    ~*Bytespider 1;
    ~*CCBot 1;
    ~*ChatGPT 1;
    ~*ChatGPT-User 1;
    ~*Claude-Web 1;
    ~*ClaudeBot 1;
    ~*Diffbot 1;
    ~*FacebookExternalHit 1;
    ~*GPTBot 1;
    ~*Google-CloudVertexBot 1;
    ~*Google-Extended 1;
    ~*ImagesiftBot 1;
    ~*OAI-SearchBot 1;
    ~*OpenAI 1;
    ~*Perplexity-User 1;
    ~*PerplexityBot 1;
    ~*PetalBot 1;
    ~*Scrapy 1;
    ~*TurnitinBot 1;
    ~*Twitterbot 1;
    ~*YandexAdditional 1;
    ~*YandexAdditionalBot 1;
    ~*anthropic-ai 1;
    ~*cohere-ai 1;
    ~*magpie-crawler 1;
    ~*meta-externalagent 1;
    ~*omgili 1;
    ~*omgilibot 1;
}

Anything that matches gets a `1` and everything else gets a `0`.

Next go to your configuration and add the following line within your `http` block;

include /etc/nginx/snippets/bad-bot.conf;

This includes the map we just created. If you have multiple virtual hosts this might be in all your site configuration files.

Finally, add the following within the `server` blocks you wish to block bots:

if ($block_ai_bot) {
  return 444;
}

This will return a `HTTP 444` which is a non-standard code used by Nginx to quietly close a connection. From the bot's standpoint it appears as if the server isn't responding and hopefully will get the picture that there is nothing to see here.

This works for any proxy or cgi configuration you have as well. Just make sure the `server` section includes the block and it will keep AI out.

$ published: 2025-07-13 21:44 $

$ tags: nginx, ai, web $

Comments?
back