Hostinger has carried out an extensive analysis of bot activity across the web, examining around 66.7 billion automated requests made to more than five million hosted websites. The findings reveal a clear split in how different types of AI crawlers are being treated, with training bots increasingly shut out, while search and assistant bots continue to gain access.

According to Hostinger’s anonymised server logs, websites are becoming far more selective about which AI systems they allow to crawl their content. The data was gathered across three separate six-day periods and categorised using the AI.txt project framework, offering a detailed snapshot of how bot access is changing.

One of the most striking trends is the sharp decline in access for large language model training bots. These crawlers, which gather data to train AI models, are being blocked by a growing number of site owners. OpenAI’s GPTBot is the clearest example, with its coverage reportedly falling from 84% of sites to just 12% during the study period.

Meta’s ExternalAgent generated the highest number of requests among training-focused bots, yet this category experienced the steepest overall drop in site access. Hostinger suggests this reflects a deliberate move by publishers to restrict crawlers that consume resources without directly driving traffic back to their sites.

This trend mirrors findings from other industry research. Previous studies have shown that a large majority of major news publishers now block at least one AI training bot, while infrastructure providers have reported high levels of full disallow rules aimed at crawlers such as GPTBot and ClaudeBot. Hostinger’s data reinforces the idea that resistance to training bots is becoming widespread, even if request volumes remain high.

In contrast, AI assistant and search-related bots tell a very different story. These crawlers, which retrieve information to answer live user queries, are steadily expanding their reach. OpenAI’s OAI-SearchBot, used to support ChatGPT’s search features, achieved an average coverage rate of 55.67% across Hostinger’s dataset.

Other assistant bots also showed growth. TikTok’s crawler reached around 25.67% coverage while generating approximately 1.4 billion requests, and Apple’s bot accessed just over 24% of websites analysed. Hostinger believes these bots face less resistance because their activity is more targeted and typically triggered by user searches rather than large-scale data collection.

Traditional search engine crawlers, meanwhile, remained largely unchanged throughout the study. Googlebot continued to access around 72% of websites on average, while Bingbot held steady at just under 58%. This consistency highlights the unique position of established search engines, where blocking access can have immediate consequences for a site’s visibility in search results.

The report also notes a decline in access for SEO and marketing crawlers. While Ahrefs maintained the largest presence in this category with around 60% coverage, overall access dropped. Hostinger attributes this to site owners becoming more cautious about bandwidth usage and increasingly restricting bots that place heavy demands on server resources.

These concerns are not theoretical. Previous data has shown that some AI crawlers can generate hundreds of millions of requests in a single month, creating real cost pressures for publishers. As a result, many site owners are rethinking which bots genuinely add value.

Taken together, the findings suggest that website operators are no longer taking an all-or-nothing approach to AI crawlers. Instead, they are distinguishing between bots that support content discovery and those that exist purely for model training.

Hostinger argues that this selective strategy may offer the best balance. By blocking training bots while allowing assistant and search crawlers, publishers can maintain visibility within AI-powered search tools without contributing content to large-scale training datasets.

OpenAI itself supports this distinction. The company advises site owners to allow OAI-SearchBot if they want their content to appear in ChatGPT search results, even if they choose to block GPTBot. OpenAI’s documentation also clarifies that different user agents serve different purposes and interact with site controls in distinct ways.

Looking ahead, Hostinger recommends that website owners regularly review their server logs to understand exactly which bots are accessing their sites. From there, decisions can be tailored to individual goals, whether that means reducing server load, protecting content, or increasing exposure through AI search platforms.

As AI search continues to evolve, the way websites manage crawler access is likely to play a growing role in shaping who benefits from the next generation of online discovery.

 

 

More Digital Marketing BLOGS here: 

Local SEO 2024 – How To Get More Local Business Calls

3 Strategies To Grow Your Business

Is Google Effective for Lead Generation?

What is SEO and How It Works?

How To Get More Customers On Facebook Without Spending Money

How Do I Get Clients Fast On Facebook?

How Do I Retarget Customers?

How Do You Use Retargeting In Marketing?

How To Get Clients From Facebook Groups

What Is The Best Way To Generate Leads On Facebook?

How Do I Get Leads From A Facebook Group?

>