Header Text - Could AI Crawlers Be Hurting Your Site’s Performance and Traffic?

Artificial Intelligence (AI) is changing how we use the internet, but this is coming at a cost for some website owners. AI crawlers are flooding some sites with a huge number of requests to scrape as much data as possible for model training. The problem comes down to the number of requests and “traffic” they can generate. This can have a seriously damaging effect on your server resources, bandwidth allocation and website’s performance as they struggle to keep up. In this guide, we show you how AI bots work and how they differ from the ones used by search engines. We also show how you can prevent them from slowing down your site, including tools, core file edits, and how Web Hosting helps keep your site up and running.

KEY TAKEAWAYS

  • AI crawlers are a type of bot that collects data from the web to train AI models.
  • Search engine crawlers are designed to drive website traffic. AI crawlers scrape content to feed LLMs, negatively impacting site traffic and SEO.
  • AI crawlers can flood websites with high-volume traffic to gather data, consuming resources, slowing down performance or causing crashes.
  • Preventing AI crawler slowdowns includes using robots.txt and llms.txt files, setting firewall rules, using a CDN, or third-party bot management tools.
  • The Hosted.com® CloudLinux integration isolates websites, ensuring performance is not affected by other sites on the same server.

What are AI Crawlers?

Before we discuss how AI crawlers can slow your site down, it’s a good idea to understand AI bots and what we use them for. In simple terms, these bots are automated software applications and scripts that perform various tasks across the Internet.

They are designed to do repetitive tasks faster and more efficiently than humans. There are good ones, like search engine crawlers, and the bad, like those used for spam and DDoS (Distributed-Denial-of-Service) attacks.

AI crawlers are bots designed to collect data from across the web to create massive datasets for training their respective AI models.

Compared to traditional search engine bots, AI crawlers are more sophisticated, as they can understand and interpret content in a human-like manner (this is important to keep in mind). They can extract information from text, images, and videos to form relationships among data points and use this as context in their answers. Some can even process dynamic content rendered with JavaScript.

In the context of this guide, AI crawlers can be considered in the bad category. This is because, unlike search engine bots, which crawl and index content by making repeated small visits to look for updates and changes, AI bots can generate massive volumes of traffic that repeatedly accesses pages and make requests.

This can happen thousands of times a second, causing sudden spikes that chew bandwidth, strain server resources, slow site performance, and even render the site inaccessible to human visitors. This can lead to lost traffic, lower search engine rankings, and ultimately fewer sales and conversions.

Strip Banner Text - AI models need bots to gather website content for training data.

Search Engine Crawlers vs AI Bots

Search engines, by nature, drive traffic to websites through the links listed in results pages after you type in a query. Their bots crawl websites to index your content. This means they scan, process, and store your site’s content to understand it and show the link in Search Engine Results Pages (SERPs).

They also have limits on the number of requests that can be made at a time to avoid overwhelming servers and websites.

The AI crawlers used by Large Language Models (LLMs) like ChatGPT and Perplexity, as we discussed earlier, scrape massive amounts of website content and data to feed into their respective LLMs for training and now account for just under 78% of AI bot activity on the internet.

Some of the most active ones are:

  • GPTBot from OpenAI.
  • ClaudeBot from Anthropic.
  • PerplexityBotfrom Perplexity AI.
  • Google-Extended, for Google’s Gemini.

What This Could Mean for Your Website

Let’s start with your search visibility. Instead of using Google and other search engines to find information, many people are using AI models. For example, when we consider ChatGPT vs Google SEO (Search Engine Optimization), even if you do everything correctly and your content appears in AI-generated answers or overviews, you may notice you get fewer visitors, even when your content is front and center.

According to studies from the Pew Research Centre, AI overviews on SERPs like Google are contributing to a steady decline in overall website traffic because people get the answer they want immediately, rather than having to visit a website.

This is because AI chat results give people a “zero click” answer, meaning they get the information they want immediately without having to click through to the page it links to, even if the link is provided. 

The next problem is that you could see an increase in traffic, which on the surface seems like a good thing. The downside is that it is coming from bots. Remember, they request pages similarly to legitimate traffic, but fewer real people visit your site. In fact, 30% of global web traffic now comes from AI bots.

It has the knock-on effect of distorting important metrics like page views, engagement, and bounce rates, because even though it seems as if you are getting a lot of traffic, the traffic isn’t coming from humans, making it harder to optimize your content and pages.

According to Cloudflare CEO Matthew Prince, speaking at an Axios event in Cannes, June 2025: “Traditionally, for every six times Google crawled a website, one person might visit… Today, Google’s crawl-to-visitor rate has declined to 18 to one, OpenAI’s has worsened to 1,500 to one, and Anthropic’s is approximately 60,000 to one.

The biggest problem is that, as we said, the AI crawlers used for training are more aggressive when it comes to indexing compared to their SEO-based cousins. These AI bots are designed to scrape large batches of web pages and content in short, high-volume bursts, often making thousands of requests a minute. Because they do this at such a high rate, it can cause massive, sudden traffic spikes that resemble DDoS attacks.

As we discussed, because they are designed to read a page like a human would, they extract full page HTML text, metadata, and media (images, videos, etc.). Newer versions can even attempt to follow links, execute JavaScript code, and render pages the same way a human visitor would.

This can max out server resources (CPU, RAM, database connections), causing slow performance, making a site unavailable to legitimate traffic or crashing it completely. A good example of this is causing the 500 Internal Server Error in browsers, which is becoming an increasingly common issue.

How to Stop AI Crawlers from Accessing Your Website

If you have noticed that AI crawlers are using a lot of bandwidth and resources, resulting in traffic spikes, slow page speeds, and high bounce rates, there are some things you can do to get them to stop.

It’s important to note that you can’t stop all bots, but the good news is that the following four methods can help reduce their impact and keep them off your site as much as possible.

Robots.txt & LLMS.txt Files

The robots.txt file tells bots what can and cannot be crawled on a page. It is a standard, simple text file that most bots, including those from search engines and AI companies such as OpenAI and Google, will respect.

To disallow all AI training bots from crawling your site, you can add a rule to your robots.txt file in your site’s root directory. In this example, we will be using GPTBot and ClaudeBot. If there is a specific one you wish to block, you can find a list of user agents (names) for the various AI bots online:

User-agent: GPTBot 
Disallow: /

User-agent: ClaudeBot 
Disallow: /

However, despite claims from AI companies, some training bots have been known to ignore robots.txt files. A newer file, llms.txt, works similarly to robots.txt. The difference is that it defines permissions specifically for AI behavior, data use and what content can be used for LLM training crawlers.

Using both can provide more control over how much access AI bots have to your site. The next option is to add a crawl-delay directive to your robots.txt file. This allows crawlers to access your site but adds a “waiting period” between requests.

Strip Banner Text - 1000s of bot requests consume resources, slowing down or crashing sites

Once again, some bots ignore the Crawl-delay directive, but most legitimate ones do (except for Google’s). Here’s an example of the HTML code:

User-agent: *
Crawl-delay: 10

In the above example, User-agent: * applies to all bot types, including most search engine crawlers, and adds a 10-second delay.

Set Firewall Rules

Aggressive crawlers and undeclared bots are notorious for ignoring robots.txt rules and directives; a good example is the ongoing Reddit vs Perplexity case. In this situation, they can be blocked at the server level.

One way to do this is to create Web Application Firewall (WAF) rules that identify and block requests from known AI crawlers or suspicious IP addresses, or to completely stop them from accessing your site.

Another method is rate limiting, which limits the number of requests one IP address can make at any one time. For example, you can set a rule to block any IP address that makes over 50 requests in a minute, which is often a sign of aggressive bots. This can be risky, because any misconfigured rules can block legitimate traffic.

If you are on a shared hosting plan, you won’t have access to the required server settings, and most providers don’t allow customers to access them.

Use a Content Delivery Network

Rather than configuring firewall rules and robots.txt files, there are tools available for managing bot traffic, especially for beginners.

A Content Delivery Network (CDN) works by caching (storing) static content (like images, CSS, and JavaScript files) on a network of servers located around the world.

When an AI bot (or human visitor) requests cached content, the CDN serves it from the closest edge server, reducing server strain and bandwidth consumption, making it a great way to absorb and mitigate the heavy traffic generated by AI crawlers.

They do this by analyzing traffic behavior and request patterns to differentiate between good crawlers and those that could potentially cause issues.

CDN provider Cloudflare’s Bot Management also features a Bot Fight Mode that can identify AI crawlers and let you add rules to grant good bots access to your site while blocking bad ones. It reports on the types of traffic flooding your site, where it’s coming from, and how many resources are being consumed.

However, this isn’t foolproof. New bots are created on new networks, and existing ones change IP addresses, which means the CDN will have to analyze and handle thousands of IPs, so there is a chance of some getting through.

Preventing Slow Downs with Hosted.com®

In web hosting, shared plans mean various websites on a single server share the same CPU, RAM, and bandwidth. This can be a problem if one site gets hit with a flood of aggressive AI crawler bot traffic; the high resource usage can slow down the others on the server.

This is where the CloudLinux server software used across the entire Hosted.com® hosting infrastructure makes all the difference in maintaining your website’s speed and stability.

The CloudLinux system isolates each website on the server into its own Lightweight Virtual Environment (LVE). The LVE technology prevents a single site from consuming all the available CPU, RAM, and bandwidth.

It also implements fair resource allocation to ensure that every website gets its proper allocation of resources, helping prevent your site from slowing down, even if AI bots are hitting another site.

This means your site remains responsive, performs as it should, and is backed by our 99.9% uptime guarantee.

With Hosted.com®, you get the dedicated resources your site needs for maximum performance and uptime, so you can focus on content creation and growing your business.

Strip Banner Text - Lightning-fast load speeds & 99.9% uptime with Hosted.com [Learn More]

How to Choose the Best Web Hosting Plan for Your Site

VIDEO: How to Choose the Best Web Hosting Plan for Your Site

FAQS

What are AI crawlers?

AI crawlers are bots that collect data from websites to train AI models. Unlike traditional search engine crawlers, which index content to provide results in SERPs, AI crawlers extract the content itself for use in LLM answers.

How will I know if an AI crawler is slowing down my site?

If your website’s bandwidth usage is increasing but your human traffic isn’t, AI crawlers may be the cause. Use monitoring tools to check for high volumes of requests from bots such as GPTBot or ClaudeBot.

How are AI crawlers different from search engine crawlers?

Traditional engine crawlers are designed to find and index content to help people discover websites. AI crawlers are designed to gather large amounts of data to train AI models, but they often do not drive traffic back to sites.

How do I block AI crawlers from my website?

The most common way is to use a robots.txt file in your website’s root directory to block crawlers from accessing your website. Additional methods include using a CDN or a firewall.

Will blocking AI bots hurt my SEO rankings?

No, in fact, it could help SEO rankings by preventing your site from being flooded with fake traffic that interferes with legitimate crawling, indexing, and performance.

Other Blogs of Interest

Exploring AI Domains – The Future of Web Addresses

Why Are .ai Domains So Expensive – The Truth Revealed

Best AI Website Builder – Create a Site in Minutes with AI

AI Website Builders – Sacrificing Creativity For Speed

Top 12 AI Tools For Small Business And Startups