Source registry

Crawler source registry

This internal registry tracks where crawler definitions came from, how much trust to assign to each source, when snapshots changed, and which definitions need review. It does not change runtime classifier behavior yet.

Runtime directory

Source-backed definitions are preferred when they are active, high or medium trust, and not failed. Static crawler definitions remain the fallback.

Active entries
23
Database preferred entries
23
Static fallback entries
0
Ignored database definitions
0
Conflicts
38

Source health

Crawler sources are tracked separately from runtime classification so definitions are auditable.

VendorSource nameSource typeTrust levelStatusLast checkedLast changedFetch methodDocs/source URL
AnthropicAnthropic crawler documentationofficialhighactive2026-06-19 15:24:162026-06-19 15:24:12static_seedhttps://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler
AppleApplebot documentationofficialhighactive2026-06-19 15:24:172026-06-19 15:24:14static_seedhttps://support.apple.com/en-us/119829
CommunityCommunity and manual crawler seedscommunitymediumactive2026-06-19 15:24:172026-06-19 15:24:14static_seed-
GoogleGoogle crawler documentationofficialhighactive2026-06-19 15:24:162026-06-19 15:24:13static_seedhttps://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers
MetaMeta crawler documentationofficialhighactive2026-06-19 15:24:172026-06-19 15:24:14static_seedhttps://developers.facebook.com/docs/sharing/webmasters/web-crawlers
MicrosoftBing crawler documentationofficialhighactive2026-06-19 15:24:172026-06-19 15:24:13static_seedhttps://www.bing.com/webmasters/help/which-crawlers-does-bing-use-8c184ec0
OpenAIOpenAI crawler documentationofficialhighactive2026-06-19 15:24:152026-06-19 15:24:11static_seedhttps://platform.openai.com/docs/bots
PerplexityPerplexityBot documentationofficialhighactive2026-06-19 15:24:172026-06-19 15:24:13static_seedhttps://www.perplexity.ai/perplexitybot

Crawler definitions

VendorCrawler IDPurposeCategoryRobots tokenUser-agent patternsExpected DNSIP range URLsTrust levelActiveSource
AmazonAmazonbotsearchsearchAmazonbotAmazonbot--mediumYesCommunity and manual crawler seeds
AnthropicClaude-SearchBotsearchaiClaude-SearchBotClaude-SearchBotanthropic.com-highYesAnthropic crawler documentation
AnthropicClaude-Useruser_actionaiClaude-UserClaude-Useranthropic.com-highYesAnthropic crawler documentation
AnthropicClaudeBottrainingaiClaudeBotClaudeBot, Mozilla/5.0 ClaudeBot/*anthropic.com-highYesAnthropic crawler documentation
AppleApplebotsearchsearchApplebotApplebotapplebot.apple.com, apple.com-highYesApplebot documentation
BaiduBaiduspidersearchsearchBaiduspiderBaiduspider--mediumYesCommunity and manual crawler seeds
ByteDanceBytespidertrainingaiBytespiderBytespider--mediumYesCommunity and manual crawler seeds
Common CrawlCCBotunknownarchiveCCBotCCBot, commoncrawl--mediumYesCommunity and manual crawler seeds
DuckDuckGoDuckDuckBotsearchsearchDuckDuckBotDuckDuckBot, DuckDuckGo--mediumYesCommunity and manual crawler seeds
GoogleGoogle-Extendedtraining_control_signalaiGoogle-ExtendedGoogle-Extendedgooglebot.com, google.com, googleusercontent.com-highYesGoogle crawler documentation
GoogleGoogle-UserTriggeredFetcheruser_actionsearchGoogle-InspectionToolGoogle-InspectionTool, Google-Site-Verificationgooglebot.com, google.com, googleusercontent.com-highYesGoogle crawler documentation
GoogleGoogleOtherunknownsearchGoogleOtherGoogleOthergooglebot.com, google.com, googleusercontent.com-highYesGoogle crawler documentation
GoogleGooglebotsearchsearchGooglebotGooglebot, Googlebot-Image, Googlebot-Newsgooglebot.com, google.com, googleusercontent.com-highYesGoogle crawler documentation
LinkedInLinkedInBotunknownsocialLinkedInBotLinkedInBot--mediumYesCommunity and manual crawler seeds
MetaFacebookBotunknownsocialFacebookBotFacebookBot, facebookexternalhitfbsv.net, facebook.com, meta.com-highYesMeta crawler documentation
MetaMeta-ExternalAgenttrainingaiMeta-ExternalAgentMeta-ExternalAgentfbsv.net, facebook.com, meta.com-highYesMeta crawler documentation
MicrosoftBingbotsearchsearchBingbotBingbot, bingpreviewsearch.msn.com-highYesBing crawler documentation
OpenAIChatGPT-Useruser_actionaiChatGPT-UserChatGPT-User, Mozilla/5.0; compatible; ChatGPT-User/*openai.comhttps://openai.com/chatgpt-user.jsonhighYesOpenAI crawler documentation
OpenAIGPTBottrainingaiGPTBotGPTBot, Mozilla/5.0; compatible; GPTBot/*openai.comhttps://openai.com/gptbot.jsonhighYesOpenAI crawler documentation
OpenAIOAI-AdsBotunknownaiOAI-AdsBotOAI-AdsBotopenai.com-highYesOpenAI crawler documentation
OpenAIOAI-SearchBotsearchaiOAI-SearchBotOAI-SearchBot, Mozilla/5.0; compatible; OAI-SearchBot/*openai.comhttps://openai.com/searchbot.jsonhighYesOpenAI crawler documentation
PerplexityPerplexityBotsearchaiPerplexityBotPerplexityBotperplexity.ai-highYesPerplexityBot documentation
YandexYandexBotsearchsearchYandexBotYandexBot--mediumYesCommunity and manual crawler seeds

Recent changes

TimeVendorCrawlerChange typeFieldOld valueNew valueSource URL
2026-06-19 15:24:15YandexYandexBotadded--YandexBot-
2026-06-19 15:24:15LinkedInLinkedInBotadded--LinkedInBot-
2026-06-19 15:24:15DuckDuckGoDuckDuckBotadded--DuckDuckBot-
2026-06-19 15:24:15Common CrawlCCBotadded--CCBot-
2026-06-19 15:24:15ByteDanceBytespideradded--Bytespider-
2026-06-19 15:24:15BaiduBaiduspideradded--Baiduspider-
2026-06-19 15:24:15AmazonAmazonbotadded--Amazonbot-
2026-06-19 15:24:14MetaMeta-ExternalAgentadded--Meta-ExternalAgenthttps://developers.facebook.com/docs/sharing/webmasters/web-crawlers
2026-06-19 15:24:14MetaFacebookBotadded--FacebookBothttps://developers.facebook.com/docs/sharing/webmasters/web-crawlers
2026-06-19 15:24:14AppleApplebotadded--Applebothttps://support.apple.com/en-us/119829
2026-06-19 15:24:14MicrosoftBingbotadded--Bingbothttps://www.bing.com/webmasters/help/which-crawlers-does-bing-use-8c184ec0
2026-06-19 15:24:13PerplexityPerplexityBotadded--PerplexityBothttps://www.perplexity.ai/perplexitybot
2026-06-19 15:24:13GoogleGooglebotadded--Googlebothttps://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers
2026-06-19 15:24:13GoogleGoogleOtheradded--GoogleOtherhttps://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers
2026-06-19 15:24:13GoogleGoogle-UserTriggeredFetcheradded--Google-UserTriggeredFetcherhttps://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers
2026-06-19 15:24:13GoogleGoogle-Extendedadded--Google-Extendedhttps://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers
2026-06-19 15:24:13AnthropicClaudeBotadded--ClaudeBothttps://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler
2026-06-19 15:24:13AnthropicClaude-Useradded--Claude-Userhttps://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler
2026-06-19 15:24:13AnthropicClaude-SearchBotadded--Claude-SearchBothttps://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler
2026-06-19 15:24:12OpenAIOAI-SearchBotadded--OAI-SearchBothttps://platform.openai.com/docs/bots
2026-06-19 15:24:12OpenAIOAI-AdsBotadded--OAI-AdsBothttps://platform.openai.com/docs/bots
2026-06-19 15:24:12OpenAIGPTBotadded--GPTBothttps://platform.openai.com/docs/bots
2026-06-19 15:24:12OpenAIChatGPT-Useradded--ChatGPT-Userhttps://platform.openai.com/docs/bots