Source registry
Crawler source registry
This internal registry tracks where crawler definitions came from, how much trust to assign to each source, when snapshots changed, and which definitions need review. It does not change runtime classifier behavior yet.
Runtime directory
Source-backed definitions are preferred when they are active, high or medium trust, and not failed. Static crawler definitions remain the fallback.
- Active entries
- 23
- Database preferred entries
- 23
- Static fallback entries
- 0
- Ignored database definitions
- 0
- Conflicts
- 38
Source health
Crawler sources are tracked separately from runtime classification so definitions are auditable.
| Vendor | Source name | Source type | Trust level | Status | Last checked | Last changed | Fetch method | Docs/source URL |
|---|---|---|---|---|---|---|---|---|
| Anthropic | Anthropic crawler documentation | official | high | active | 2026-06-19 15:24:16 | 2026-06-19 15:24:12 | static_seed | https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler |
| Apple | Applebot documentation | official | high | active | 2026-06-19 15:24:17 | 2026-06-19 15:24:14 | static_seed | https://support.apple.com/en-us/119829 |
| Community | Community and manual crawler seeds | community | medium | active | 2026-06-19 15:24:17 | 2026-06-19 15:24:14 | static_seed | - |
| Google crawler documentation | official | high | active | 2026-06-19 15:24:16 | 2026-06-19 15:24:13 | static_seed | https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers | |
| Meta | Meta crawler documentation | official | high | active | 2026-06-19 15:24:17 | 2026-06-19 15:24:14 | static_seed | https://developers.facebook.com/docs/sharing/webmasters/web-crawlers |
| Microsoft | Bing crawler documentation | official | high | active | 2026-06-19 15:24:17 | 2026-06-19 15:24:13 | static_seed | https://www.bing.com/webmasters/help/which-crawlers-does-bing-use-8c184ec0 |
| OpenAI | OpenAI crawler documentation | official | high | active | 2026-06-19 15:24:15 | 2026-06-19 15:24:11 | static_seed | https://platform.openai.com/docs/bots |
| Perplexity | PerplexityBot documentation | official | high | active | 2026-06-19 15:24:17 | 2026-06-19 15:24:13 | static_seed | https://www.perplexity.ai/perplexitybot |
Crawler definitions
| Vendor | Crawler ID | Purpose | Category | Robots token | User-agent patterns | Expected DNS | IP range URLs | Trust level | Active | Source |
|---|---|---|---|---|---|---|---|---|---|---|
| Amazon | Amazonbot | search | search | Amazonbot | Amazonbot | - | - | medium | Yes | Community and manual crawler seeds |
| Anthropic | Claude-SearchBot | search | ai | Claude-SearchBot | Claude-SearchBot | anthropic.com | - | high | Yes | Anthropic crawler documentation |
| Anthropic | Claude-User | user_action | ai | Claude-User | Claude-User | anthropic.com | - | high | Yes | Anthropic crawler documentation |
| Anthropic | ClaudeBot | training | ai | ClaudeBot | ClaudeBot, Mozilla/5.0 ClaudeBot/* | anthropic.com | - | high | Yes | Anthropic crawler documentation |
| Apple | Applebot | search | search | Applebot | Applebot | applebot.apple.com, apple.com | - | high | Yes | Applebot documentation |
| Baidu | Baiduspider | search | search | Baiduspider | Baiduspider | - | - | medium | Yes | Community and manual crawler seeds |
| ByteDance | Bytespider | training | ai | Bytespider | Bytespider | - | - | medium | Yes | Community and manual crawler seeds |
| Common Crawl | CCBot | unknown | archive | CCBot | CCBot, commoncrawl | - | - | medium | Yes | Community and manual crawler seeds |
| DuckDuckGo | DuckDuckBot | search | search | DuckDuckBot | DuckDuckBot, DuckDuckGo | - | - | medium | Yes | Community and manual crawler seeds |
| Google-Extended | training_control_signal | ai | Google-Extended | Google-Extended | googlebot.com, google.com, googleusercontent.com | - | high | Yes | Google crawler documentation | |
| Google-UserTriggeredFetcher | user_action | search | Google-InspectionTool | Google-InspectionTool, Google-Site-Verification | googlebot.com, google.com, googleusercontent.com | - | high | Yes | Google crawler documentation | |
| GoogleOther | unknown | search | GoogleOther | GoogleOther | googlebot.com, google.com, googleusercontent.com | - | high | Yes | Google crawler documentation | |
| Googlebot | search | search | Googlebot | Googlebot, Googlebot-Image, Googlebot-News | googlebot.com, google.com, googleusercontent.com | - | high | Yes | Google crawler documentation | |
| LinkedInBot | unknown | social | LinkedInBot | LinkedInBot | - | - | medium | Yes | Community and manual crawler seeds | |
| Meta | FacebookBot | unknown | social | FacebookBot | FacebookBot, facebookexternalhit | fbsv.net, facebook.com, meta.com | - | high | Yes | Meta crawler documentation |
| Meta | Meta-ExternalAgent | training | ai | Meta-ExternalAgent | Meta-ExternalAgent | fbsv.net, facebook.com, meta.com | - | high | Yes | Meta crawler documentation |
| Microsoft | Bingbot | search | search | Bingbot | Bingbot, bingpreview | search.msn.com | - | high | Yes | Bing crawler documentation |
| OpenAI | ChatGPT-User | user_action | ai | ChatGPT-User | ChatGPT-User, Mozilla/5.0; compatible; ChatGPT-User/* | openai.com | https://openai.com/chatgpt-user.json | high | Yes | OpenAI crawler documentation |
| OpenAI | GPTBot | training | ai | GPTBot | GPTBot, Mozilla/5.0; compatible; GPTBot/* | openai.com | https://openai.com/gptbot.json | high | Yes | OpenAI crawler documentation |
| OpenAI | OAI-AdsBot | unknown | ai | OAI-AdsBot | OAI-AdsBot | openai.com | - | high | Yes | OpenAI crawler documentation |
| OpenAI | OAI-SearchBot | search | ai | OAI-SearchBot | OAI-SearchBot, Mozilla/5.0; compatible; OAI-SearchBot/* | openai.com | https://openai.com/searchbot.json | high | Yes | OpenAI crawler documentation |
| Perplexity | PerplexityBot | search | ai | PerplexityBot | PerplexityBot | perplexity.ai | - | high | Yes | PerplexityBot documentation |
| Yandex | YandexBot | search | search | YandexBot | YandexBot | - | - | medium | Yes | Community and manual crawler seeds |
Recent changes
| Time | Vendor | Crawler | Change type | Field | Old value | New value | Source URL |
|---|---|---|---|---|---|---|---|
| 2026-06-19 15:24:15 | Yandex | YandexBot | added | - | - | YandexBot | - |
| 2026-06-19 15:24:15 | LinkedInBot | added | - | - | LinkedInBot | - | |
| 2026-06-19 15:24:15 | DuckDuckGo | DuckDuckBot | added | - | - | DuckDuckBot | - |
| 2026-06-19 15:24:15 | Common Crawl | CCBot | added | - | - | CCBot | - |
| 2026-06-19 15:24:15 | ByteDance | Bytespider | added | - | - | Bytespider | - |
| 2026-06-19 15:24:15 | Baidu | Baiduspider | added | - | - | Baiduspider | - |
| 2026-06-19 15:24:15 | Amazon | Amazonbot | added | - | - | Amazonbot | - |
| 2026-06-19 15:24:14 | Meta | Meta-ExternalAgent | added | - | - | Meta-ExternalAgent | https://developers.facebook.com/docs/sharing/webmasters/web-crawlers |
| 2026-06-19 15:24:14 | Meta | FacebookBot | added | - | - | FacebookBot | https://developers.facebook.com/docs/sharing/webmasters/web-crawlers |
| 2026-06-19 15:24:14 | Apple | Applebot | added | - | - | Applebot | https://support.apple.com/en-us/119829 |
| 2026-06-19 15:24:14 | Microsoft | Bingbot | added | - | - | Bingbot | https://www.bing.com/webmasters/help/which-crawlers-does-bing-use-8c184ec0 |
| 2026-06-19 15:24:13 | Perplexity | PerplexityBot | added | - | - | PerplexityBot | https://www.perplexity.ai/perplexitybot |
| 2026-06-19 15:24:13 | Googlebot | added | - | - | Googlebot | https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers | |
| 2026-06-19 15:24:13 | GoogleOther | added | - | - | GoogleOther | https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers | |
| 2026-06-19 15:24:13 | Google-UserTriggeredFetcher | added | - | - | Google-UserTriggeredFetcher | https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers | |
| 2026-06-19 15:24:13 | Google-Extended | added | - | - | Google-Extended | https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers | |
| 2026-06-19 15:24:13 | Anthropic | ClaudeBot | added | - | - | ClaudeBot | https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler |
| 2026-06-19 15:24:13 | Anthropic | Claude-User | added | - | - | Claude-User | https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler |
| 2026-06-19 15:24:13 | Anthropic | Claude-SearchBot | added | - | - | Claude-SearchBot | https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler |
| 2026-06-19 15:24:12 | OpenAI | OAI-SearchBot | added | - | - | OAI-SearchBot | https://platform.openai.com/docs/bots |
| 2026-06-19 15:24:12 | OpenAI | OAI-AdsBot | added | - | - | OAI-AdsBot | https://platform.openai.com/docs/bots |
| 2026-06-19 15:24:12 | OpenAI | GPTBot | added | - | - | GPTBot | https://platform.openai.com/docs/bots |
| 2026-06-19 15:24:12 | OpenAI | ChatGPT-User | added | - | - | ChatGPT-User | https://platform.openai.com/docs/bots |
