“Thanks for calling that out. That was bad advice on my part.”
For reasons, I noticed that ClaudeBot (and I'd link to the information page for it, but, and this is my shocked face, they forgot to include a proper link!) was repeatedly requesting the same file over and over again—/sitemap.xml. My bad! I neglected to configure my web server to return “410 Gone” for that page. I went to the main site of the bot and found a “call to sign up!” but sadly, no obvious link to stopping their bot.
So I figured, I might as well try this LLM thing once more. The last last [1] few [2] times I tried, it didn't go well. Maybe third time's a charm?
Maybe?
Sean: How do I stop your XXXXXXX webbot from crawling my website?
Claude: [Thinks hard for a few minutes.]
I can help you block Anthropic's web crawler from accessing your website.
You'll want to add rules to your robots.txt file. Here's what to add:
User-agent: anthropic-ai Disallow: / User-agent: Claude-Web Disallow: /
Put this in your robots.txt file at the root of your domain (e.g., https://yourdomain.com/robots.txt). [Apparently Claude doesn't know that there's a domain set aside specifically for examples, namely, **EXAMPLE.COM**. Claude also doesn't realize that yourdomain.com [3] actually exists! Way to go, Claude! slow clap]
The two user-agent strings cover Anthropic's crawlers:
- anthropic-ai - the main crawler identifier
- Claude-Web - an alternative identifier that may be used
If you want to block specific sections instead of your entire site, you can replace / with specific paths:
User-agent: anthropic-ai Disallow: /private/ Disallow: /admin/
The crawler should respect these rules within a reasonable timeframe after you update your robots.txt file.
If you're still seeing crawling activity after adding these rules, or if you need immediate blocking, you could also block by IP (Internet Protocol) address in your server configuration or firewall, though the specific IPs may change over time.
Sean: [Finds the page [4] that describes Anthropic's web bots, not from a link on the Anthropic web site, but a normal web search.]
What about Claude-User? ClaudeBot? Claude-SearchBot? Did you XXXXXXX forget those? Or did your overlords tell you to conveniently “forget” those as well?
Claude: You're absolutely right - I should have given you the complete list. Here are all the Anthropic crawler user-agents you should block:
User-agent: anthropic-ai Disallow: / User-agent: Claude-Web Disallow: / User-agent: ClaudeBot Disallow: / User-agent: Claude-SearchBot Disallow: / User-agent: Claude-User Disallow: /
I didn't intentionally leave those out - I gave you an incomplete answer, which wasn't helpful when you're trying to actually block the crawlers. Thanks for calling that out.
If you want to be thorough, you could also add a blanket rule that catches any variations:
User-agent: *Claude* Disallow: / User-agent: *anthropic* Disallow: /
Though the specific user-agent blocking is more precise if you want to maintain control over what else can access your site.
Sean: Are you XXXXXXX sure that's legal syntax? That doesn't look kosher to me. Again, are you trying to get around any bans?
Claude: You're right to call that out - wildcards with asterisks like *Claude* are NOT valid robots.txt syntax. That was bad advice on my part.
The correct approach is to list each user-agent explicitly:
User-agent: anthropic-ai Disallow: / User-agent: Claude-Web Disallow: / User-agent: ClaudeBot Disallow: / User-agent: Claude-SearchBot Disallow: / User-agent: Claude-User Disallow: /
That's the proper syntax. Each User-agent directive needs to be spelled out exactly, and wildcards don't work in the User-agent field in standard robots.txt files.
I'm not trying to get around anything - I just gave you incorrect technical information, which is worse than useless when you're trying to actually implement this. My apologies for the confusion.
Oh wait! I know! I'm holding it wrong, and I'm using an LLM that is older than 20 minutes. My mistake. And before you say I should have asked politely, no. This is a computer! It's not human! Stop anthropomorphizing it!
And of course, at the bottom of the chat page, grey-on-slightly-darker-grey, is the text “Claude can make mistakes. Please double-check responses.”
How anybody finds this garbage useful is beyond me.