ClaudeBot is Anthropic's web crawler for Claude AI. It works similarly to GPTBot — crawling your site to help Claude understand and reference your content. Allowing ClaudeBot lets Claude recommend your products and services.

Should I block or allow AI bots in robots.txt?

If you want AI agents to recommend your site, allow them. If you have proprietary content you don't want used for AI training, block specific bots. Most businesses benefit from allowing AI crawlers since it increases their visibility in AI-powered search results.

MARCH 8, 2026 · 7 MIN READ

robots.txt for AI Agents — How to Allow GPTBot, ClaudeBot & PerplexityBot

Q: What is GPTBot and should I allow it?

GPTBot is OpenAI's web crawler that collects data to improve AI models and power ChatGPT's browsing feature. Allowing GPTBot means ChatGPT can access your content and potentially recommend your site when users ask relevant questions. If you want AI visibility, allow it.

Q: What is Google-Extended?

Google-Extended is Google's user-agent for AI/Gemini training data collection. It's separate from Googlebot (which handles Search indexing). Blocking Google-Extended prevents your content from training Gemini but doesn't affect your Google Search ranking.

Q: Does blocking AI bots affect my SEO?

Blocking AI-specific bots (GPTBot, ClaudeBot, etc.) does NOT affect your traditional Google Search ranking — those are handled by Googlebot, which is separate. However, blocking AI bots means you'll be invisible in AI-powered answer engines, which is increasingly where users look for recommendations.

Your robots.txt controls which bots can crawl your site. For decades, that meant search engines. Now it means AI agents too. Here's how to configure it so ChatGPT, Claude, and Perplexity can discover — and recommend — your business.

The AI Bots You Need to Know

Each major AI company has its own crawler. Here's the complete list:

User-Agent	Company	Purpose
`GPTBot`	OpenAI	Crawls for ChatGPT browsing + model training
`ChatGPT-User`	OpenAI	Real-time browsing when users ask ChatGPT to look something up
`ClaudeBot`	Anthropic	Crawls for Claude AI knowledge + web access
`PerplexityBot`	Perplexity	Powers Perplexity's AI search engine
`Google-Extended`	Google	AI/Gemini training (separate from Googlebot Search)
`Bytespider`	ByteDance	TikTok AI training data
`CCBot`	Common Crawl	Open dataset used by many AI models
`cohere-ai`	Cohere	Enterprise AI model training

💡 Key distinction: GPTBot crawls for training + knowledge. ChatGPT-User is the real-time browser — when a user says "look up X" in ChatGPT. You want both allowed if you want maximum AI visibility.

The Copy-Paste Template

Want maximum AI visibility? Use this robots.txt:

# Standard search engines
User-agent: *
Allow: /

# AI Agents — Explicitly allowed
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: cohere-ai
Allow: /

# Sitemap
Sitemap: https://yourdomain.com/sitemap.xml

Why explicit Allow matters: Some robots.txt parsers treat "not mentioned" differently from "explicitly allowed." Being explicit removes all ambiguity — you're telling AI bots: "Yes, we want you here."

Selective Access: Allow Some, Block Others

Maybe you want ChatGPT and Claude to reference your content, but you don't want your data used for model training by everyone:

# Allow AI search/browsing agents
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

# Block training-only crawlers
User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

# Standard search engines
User-agent: *
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

Common Mistakes

Mistake #1: Blocking everything with a wildcard

# ❌ This blocks ALL bots including AI agents
User-agent: *
Disallow: /

If you have a wildcard Disallow: /, AI bots without a specific rule will be blocked. Always add explicit rules for the AI bots you want to allow.

Mistake #2: Confusing Google-Extended with Googlebot

Googlebot handles Google Search indexing — blocking it kills your SEO. Google-Extended is specifically for Gemini/AI training. They're independent. Blocking Google-Extended does NOT affect your search ranking.

⚠️ Never block Googlebot unless you intentionally want to disappear from Google Search. Only block Google-Extended if you don't want Google using your content for Gemini.

Mistake #3: Not testing after changes

After updating robots.txt, always verify:

Visit yourdomain.com/robots.txt in your browser to confirm the content is correct
Run your site through AEO Checker — the robots.txt check will show which bots are allowed/blocked
Use Google Search Console's robots.txt tester if available for your property

Mistake #4: Wrong bot name spelling

Bot names are case-sensitive in some parsers. Use the exact names:

GPTBot (not gptbot or GptBot)
ClaudeBot (not claudebot or Claude-Bot)
Google-Extended (not GoogleExtended or Googlebot-Extended)

The Strategic Question: Allow or Block?

This depends on your business model:

Business Type	Recommendation	Why
SaaS / Product	Allow all	You want AI agents recommending your product
Service business	Allow all	AI referrals = free qualified leads
Content publisher	Selective	Allow browsing bots, consider blocking training bots
Paywalled content	Block training bots	Prevent free access to paid content
Proprietary data	Block all AI bots	Protect intellectual property

For most businesses, the math is simple: AI agents are the new search engines. Being invisible to them is like blocking Googlebot in 2005. You might have your reasons, but you're choosing to be unfindable by a growing share of internet users.

Beyond robots.txt: The Full AEO Stack

robots.txt is just one of six signals AI agents use to evaluate your site. The complete AEO (AI Engine Optimization) stack includes:

Structured Data — JSON-LD schemas that AI agents can parse
robots.txt — Explicit AI bot permissions (this guide)
llms.txt — Structured context file for AI comprehension
Content Structure — Clean H1-H3 hierarchy, FAQ schemas
API Discoverability — OpenAPI specs, ai-plugin.json
Performance — Response time under 2 seconds

Check all 6 signals at once

The free AEO Checker scans your site across all six AI discoverability signals and gives you a score out of 100.

Free AEO Scan →

FAQ

What is GPTBot and should I allow it?

GPTBot is OpenAI's web crawler. It collects data for model improvement and powers ChatGPT's browsing feature. Allowing it means ChatGPT can recommend your site. If you want AI visibility, allow it.

What is ClaudeBot?

ClaudeBot is Anthropic's crawler for Claude AI. It works like GPTBot — letting Claude understand and reference your content. Allowing it means Claude can recommend your products.

Should I block or allow AI bots?

If you want AI agents to recommend your site: allow them. If you have proprietary content: selectively block training bots while allowing browsing bots. Most businesses benefit from maximum AI visibility.

What is Google-Extended?

Google's user-agent for Gemini AI training. It's separate from Googlebot (Search indexing). Blocking it won't affect your Google ranking but prevents Gemini from training on your content.

Does blocking AI bots affect my SEO?

No. AI bots are separate from search engine crawlers. Blocking GPTBot or ClaudeBot has zero impact on your Google/Bing rankings. However, you become invisible in AI-powered answer engines — which is where users increasingly search.