robots.txt for AI Agents — How to Allow GPTBot, ClaudeBot & PerplexityBot

Your robots.txt controls which bots can crawl your site. For decades, that meant search engines. Now it means AI agents too. Here's how to configure it so ChatGPT, Claude, and Perplexity can discover — and recommend — your business.

The AI Bots You Need to Know

Each major AI company has its own crawler. Here's the complete list:

User-AgentCompanyPurpose
GPTBotOpenAICrawls for ChatGPT browsing + model training
ChatGPT-UserOpenAIReal-time browsing when users ask ChatGPT to look something up
ClaudeBotAnthropicCrawls for Claude AI knowledge + web access
PerplexityBotPerplexityPowers Perplexity's AI search engine
Google-ExtendedGoogleAI/Gemini training (separate from Googlebot Search)
BytespiderByteDanceTikTok AI training data
CCBotCommon CrawlOpen dataset used by many AI models
cohere-aiCohereEnterprise AI model training
💡 Key distinction: GPTBot crawls for training + knowledge. ChatGPT-User is the real-time browser — when a user says "look up X" in ChatGPT. You want both allowed if you want maximum AI visibility.

The Copy-Paste Template

Want maximum AI visibility? Use this robots.txt:

# Standard search engines
User-agent: *
Allow: /

# AI Agents — Explicitly allowed
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: cohere-ai
Allow: /

# Sitemap
Sitemap: https://yourdomain.com/sitemap.xml

Why explicit Allow matters: Some robots.txt parsers treat "not mentioned" differently from "explicitly allowed." Being explicit removes all ambiguity — you're telling AI bots: "Yes, we want you here."

Selective Access: Allow Some, Block Others

Maybe you want ChatGPT and Claude to reference your content, but you don't want your data used for model training by everyone:

# Allow AI search/browsing agents
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

# Block training-only crawlers
User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

# Standard search engines
User-agent: *
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

Common Mistakes

Mistake #1: Blocking everything with a wildcard

# ❌ This blocks ALL bots including AI agents
User-agent: *
Disallow: /

If you have a wildcard Disallow: /, AI bots without a specific rule will be blocked. Always add explicit rules for the AI bots you want to allow.

Mistake #2: Confusing Google-Extended with Googlebot

Googlebot handles Google Search indexing — blocking it kills your SEO. Google-Extended is specifically for Gemini/AI training. They're independent. Blocking Google-Extended does NOT affect your search ranking.

⚠️ Never block Googlebot unless you intentionally want to disappear from Google Search. Only block Google-Extended if you don't want Google using your content for Gemini.

Mistake #3: Not testing after changes

After updating robots.txt, always verify:

  1. Visit yourdomain.com/robots.txt in your browser to confirm the content is correct
  2. Run your site through AEO Checker — the robots.txt check will show which bots are allowed/blocked
  3. Use Google Search Console's robots.txt tester if available for your property

Mistake #4: Wrong bot name spelling

Bot names are case-sensitive in some parsers. Use the exact names:

The Strategic Question: Allow or Block?

This depends on your business model:

Business TypeRecommendationWhy
SaaS / ProductAllow allYou want AI agents recommending your product
Service businessAllow allAI referrals = free qualified leads
Content publisherSelectiveAllow browsing bots, consider blocking training bots
Paywalled contentBlock training botsPrevent free access to paid content
Proprietary dataBlock all AI botsProtect intellectual property

For most businesses, the math is simple: AI agents are the new search engines. Being invisible to them is like blocking Googlebot in 2005. You might have your reasons, but you're choosing to be unfindable by a growing share of internet users.

Beyond robots.txt: The Full AEO Stack

robots.txt is just one of six signals AI agents use to evaluate your site. The complete AEO (AI Engine Optimization) stack includes:

  1. Structured Data — JSON-LD schemas that AI agents can parse
  2. robots.txt — Explicit AI bot permissions (this guide)
  3. llms.txt — Structured context file for AI comprehension
  4. Content Structure — Clean H1-H3 hierarchy, FAQ schemas
  5. API Discoverability — OpenAPI specs, ai-plugin.json
  6. Performance — Response time under 2 seconds

Check all 6 signals at once

The free AEO Checker scans your site across all six AI discoverability signals and gives you a score out of 100.

Free AEO Scan →

FAQ

What is GPTBot and should I allow it?

GPTBot is OpenAI's web crawler. It collects data for model improvement and powers ChatGPT's browsing feature. Allowing it means ChatGPT can recommend your site. If you want AI visibility, allow it.

What is ClaudeBot?

ClaudeBot is Anthropic's crawler for Claude AI. It works like GPTBot — letting Claude understand and reference your content. Allowing it means Claude can recommend your products.

Should I block or allow AI bots?

If you want AI agents to recommend your site: allow them. If you have proprietary content: selectively block training bots while allowing browsing bots. Most businesses benefit from maximum AI visibility.

What is Google-Extended?

Google's user-agent for Gemini AI training. It's separate from Googlebot (Search indexing). Blocking it won't affect your Google ranking but prevents Gemini from training on your content.

Does blocking AI bots affect my SEO?

No. AI bots are separate from search engine crawlers. Blocking GPTBot or ClaudeBot has zero impact on your Google/Bing rankings. However, you become invisible in AI-powered answer engines — which is where users increasingly search.

Related: What is AEO? · How to Create llms.txt · AEO Optimization Guide · AEO vs SEO