26% of the top 100 websites are now blocking GPTBot
Hacker News
SEPTEMBER 27, 2023
Common Crawl’s web crawler is still blocked less – by just 130 websites. That’s why I wrote “at least” in the opening sentence.) As a reminder, Common Crawl provides part of the training data used by OpenAI, Google and others. 109 of the top 1,000 websites block both GPTBot and CCbot. Limitations.
Let's personalize your content