Imports

Publishers upload three different CSV files. Each file comes from a different system: your CMS or URL catalog, your CDN/server logs, and optionally your analytics referral export. Upload in order — content catalog first, then traffic logs.

Logs show access behavior, not proof of model training or copyright infringement. What logs can and cannot prove

1

Content catalog

Required

A list of your URLs with editorial metadata, not traffic logs.

Upload CSV

Where publishers usually get this

  • CMS article export or URL inventory
  • Editorial spreadsheet with path, title, section, and paywall flag
  • Sitemap enriched with content class and section

Do not upload CDN or server access logs here (that is step 2).

Required CSV columns

path

Optional: url, title, section, author, publishedAt, contentClass, isPaywalled, externalId

Accepted contentClass values

public_news, premium_analysis, archive, docs, market_data, other

Example CSV

Download sample
url,path,title,section,author,publishedAt,contentClass,isPaywalled,externalId
https://publisher.com/premium/banking-analysis,/premium/banking-analysis,Banking Analysis,Markets,Jane Smith,2026-01-15,premium_analysis,true,art_001
  • If contentClass is blank, it is inferred from the path prefix where possible.
  • Upload this before traffic logs so requests can be mapped to business context.

Last import: completed · 5 rows processed

content-catalog-example.csv

2

Traffic logs

Required

30 days of HTTP request logs from your CDN or origin, one row per request.

Upload CSV

Where publishers usually get this

  • Cloudflare HTTP request log export (transform to the columns below)
  • Akamai, Fastly, or origin/nginx/apache access log export
  • Any CDN or server log you can map to the generic CSV format

Do not upload your article catalog here (that is step 1).

Required CSV columns

timestamp, userAgent, method, host, path, statusCode

Optional: ip, asn, reverseDns, bytesSent, referer

Example CSV

Download sample
timestamp,ip,asn,reverseDns,userAgent,method,host,path,statusCode,bytesSent,referer
2026-06-16T10:04:22Z,40.83.2.71,8075,crawler.openai.com,"Mozilla/5.0; compatible; GPTBot/1.3; +https://openai.com/gptbot",GET,publisher.com,/premium/banking-analysis,200,182001,
  • Raw IP addresses are hashed during import and are not stored.
  • Cloudflare-native CSV is not supported yet; transform it to this format first.

Last import: completed · 6 rows processed

traffic-logs-example.csv

3

Referral data

Optional

Optional analytics export showing traffic returned from AI/search platforms.

Upload CSV

Where publishers usually get this

  • Web analytics referral or landing-page export
  • UTM / session export from your analytics stack
  • Partner referral report with landing path and referer URL

Not the same as crawler access logs — this measures returned traffic.

Required CSV columns

timestamp, landingPath

Optional: referer, sessionId, utmSource, conversion, conversionValue

Example CSV

Download sample
timestamp,referer,landingPath,sessionId,utmSource,conversion,conversionValue
2026-06-16T10:04:22Z,https://chatgpt.com/c/abc,/markets/nvidia-earnings,sess_001,chatgpt,true,25.00
  • Platforms are classified from referer and UTM source (ChatGPT, Perplexity, Google, etc.).
  • Referrals compare crawl volume against returned traffic; they are not proof of model training.

Last import: completed · 5 rows processed

referral-data-example.csv