Crawl budget: What you need to know in 2025

Crawl budget is a concept that often causes confusion for many website owners and SEO professionals. Understanding it fully is essential for optimising a website’s search engine visibility. In simple terms, crawl budget refers to the number of pages a search engine crawler (such as Googlebot) will crawl on your site within a given timeframe.

Several factors influence crawl budget, including the server capacity, the website’s structure, and the efficiency of the internal linking system. By managing these elements effectively, you can ensure that your site is crawled and indexed more efficiently, leading to better visibility in search results.

In this comprehensive guide, we’ll explore the various aspects of crawl budget. You’ll learn how server performance, technical issues, and crawl prioritisation affect the crawling process. Additionally, we’ll dive into the steps you can take to resolve common crawling problems and enhance your site’s indexing for better SEO results.

Understanding crawl budget is becoming even more crucial in 2025, as search engines continue to refine their crawling methods to accommodate the growing number of websites on the internet. Knowing how to optimise your crawl budget can give you a significant advantage when it comes to achieving higher rankings and driving organic traffic to your site.

So, if you’re concerned about crawl budget and its role in your SEO strategy, this guide will provide the insights and tips you need to navigate it effectively. By addressing potential crawl issues and improving server efficiency, you can ensure that your website gets the attention it deserves from search engines.

Why would search bots limit crawling?

Google’s Gary Illyes recently offered a clear explanation of crawl budget, highlighting how Googlebot aims to be a “good citizen of the web.” This concept is essential for understanding why crawl budget exists and how it impacts your website’s SEO.

To put it simply, imagine the chaos when tickets to a popular concert go on sale. As soon as the tickets are released, a surge of users floods the website, overwhelming the server and causing it to crash. This leads to a frustrating experience for users, preventing them from securing their tickets.

A similar issue can occur with bots. Recall a time when you forgot to adjust the crawling speed or the number of simultaneous connections on your site crawler, and as a result, you brought down the website you were crawling on? This situation is analogous to what can happen with Googlebot. If it accesses a website too frequently or uses too many parallel connections, it could overwhelm the server, causing disruptions.

To avoid this, Googlebot follows a principle of being a “good citizen” by adjusting its crawling behaviour. The search engine giant is programmed to respect the server’s limits. Googlebot sets a “crawl capacity limit” for each site based on its ability to handle requests. If the site’s server responds well to the crawl, Googlebot will maintain its pace and even increase the number of simultaneous connections. However, if the server struggles with the traffic, Googlebot will reduce its crawling speed and limit the number of connections to prevent overloading the site.

The cost of crawling

Crawling, parsing, and rendering all consume resources, and there are significant financial considerations involved in the process.

This is one of the reasons why Google and other search engines may adjust their crawling strategy, often with the aim of benefiting the site. By optimising the crawling process, they can ensure that resources are used effectively, without overwhelming the server or wasting unnecessary computational power.

However, it’s likely that financial cost calculations play a role in determining how often a particular URL should be crawled. The more frequently a page is crawled, the more resources are used, and search engines must weigh these costs against the potential benefits of indexing the page.

What is crawl budget?

Crawl budget refers to the amount of time and resources that Googlebot allocates to crawling a website. This is an important aspect of technical SEO, as it directly impacts how frequently search engines index a website’s pages.

Crawl budget is determined by two key factors: the crawl capacity limit and crawl demand. The crawl capacity limit reflects how much crawling a site can handle without experiencing performance issues. Essentially, it’s the maximum amount of crawling that the server can manage without being overwhelmed.

On the other hand, crawl demand is based on Googlebot’s assessment of the website’s content. This includes evaluating individual URLs and the need to update its understanding of those pages. More popular pages, for instance, are crawled more frequently to ensure that the index remains up-to-date and reflects any recent changes.

Google calculates this budget to strike a balance between the resources it can afford to spend on crawling and the need to protect both the website and its own infrastructure. By managing this process carefully, Googlebot ensures that websites are crawled efficiently without causing strain on the server or compromising the quality of its search results.

What causes issues with crawl budget

Not all websites will experience noticeable effects from having a crawl budget. For many smaller websites, this is simply not an issue that requires much attention. However, Google clearly states that there are three types of websites that need to manage their crawl budget actively. These include:

Large sites, typically those with over 1 million unique pages.
Medium or large sites with frequently updating content.
Sites with a high volume of “Discovered – currently not indexed” URLs, as detailed in Google Search Console’s Page Indexing report.

That said, it is essential to exercise caution before dismissing your website as falling outside these categories. It’s always a good idea to crawl your site and assess its actual structure.

For instance, you might believe that your small e-commerce store, with just a couple of thousand SKUs and a few informational pages, wouldn’t be affected. However, with elements like faceted navigation and pagination, your website might contain ten times the number of URLs than you originally anticipated.

Furthermore, if your site targets multiple languages or locations, you could be dealing with multiple versions of each page. To get a clearer view of your website’s true scale, set your crawling tool to crawl as Googlebot or Bingbot. Allow it to crawl all the pages that these search bots would typically access. This will provide you with a more accurate understanding of your website’s size, as these bots would perceive it.

Why crawl budget is important

Why is Google recommending that the three types of websites mentioned above consider their crawl budget? Why is it crucial to monitor and manage it?

The answer lies in the impact of insufficient crawl budgets. If your crawl budget is too low to allow search bots to discover all the new URLs you’ve added or revisit URLs that have been updated, the bots may never become aware of the content on those pages.

As a result, those pages might not get indexed by search engines. Even if they are indexed, they might not rank as highly as they could if the bots had the opportunity to crawl them more frequently. Monitoring and managing your crawl budget ensures that Googlebot can effectively access and index all the valuable content on your site, improving its chances of ranking well in search results.

How crawl budget issues happen

here are three main factors that can cause crawl budget issues for your website: the quality of URLs, the volume of URLs, and their accessibility. Let’s break these down.

Quality

Googlebot considers other pages on your website when deciding whether to crawl new pages. If Googlebot deems a page’s content not valuable enough to index, it may choose not to crawl it. Several factors can influence this decision:

High volumes of on-site duplicate content.
Hacked pages with poor-quality content.
Internally created low-quality and spam content.

Such poor-quality pages could be intentionally created, either internally or by external actors, or they may be a side effect of poor website design and copy. If Googlebot detects these issues, it may decide not to spend resources on crawling these pages.

Volume

Many websites have more URLs than they realise, often due to technical issues like faceted navigation and infinite URL creation. These can easily cause a site to have a vast number of URLs, which could strain your crawl budget.

Faceted Navigation

Faceted navigation is commonly found on ecommerce websites. For example, if you have a category page like www.example-pet-store.com/cats/toys, a user can filter the page by specific attributes like “contains catnip”. This would create a new URL, such as www.example-pet-store.com/cats/toys?contains=catnip. Users might then further narrow their search, resulting in another URL, like www.example-pet-store.com/cats/toys?contains=catnip&design=feathers.

As users continue filtering, the number of URLs multiplies. If Googlebot can access these pages through internal or external links, or if they are included in the XML sitemap, it may crawl all of them. Over time, instead of crawling just the 200 category and product pages, Googlebot might end up crawling thousands of variant URLs, unnecessarily eating into your crawl budget.

Infinite URL Creation

Certain website features, such as event calendars, can also lead to infinite URL creation. For example, consider a calendar page like /events-calendar/january-2025. If the website allows users to click through to the next month’s events, the URL changes to /events-calendar/february-2025, and so on. If there is no limit to how far into the future users can click, the search bots may follow these links endlessly, creating URLs like /events-calendar/december-2086.

This becomes a problem when these pages are empty and contain no useful content. Instead of wasting resources crawling these empty pages, Googlebot could be indexing new, relevant content on the website, such as newly added products.

Accessibility

Search bots often reduce the frequency of crawling a URL if it returns a server response code other than 200 (which indicates a successful response). For instance, if a page returns a 4XX code, such as 404 (Page Not Found), it signals that the page either cannot be found or should not be accessed. This leads to the page being crawled less frequently.

Similarly, if multiple URLs on a website return server errors like 429 (Too Many Requests) or 500 (Internal Server Error), search bots may reduce their crawling of those pages. If the issue persists, these pages could eventually be dropped from the index entirely.

Redirects, though generally less impactful, can also influence crawling. Excessive use of redirects, especially long chains that point from one URL to another in succession, can have a cumulative effect over time, slowing down or complicating the crawling process. It’s important to ensure redirects are used sparingly and effectively to avoid negatively affecting the crawl budget.

How to identify crawl budget problems

Identifying Crawl Budget Issues

It’s impossible to determine if your site is suffering from crawl budget issues just by looking at it directly. Instead, you’ll need to gather insights from search engine reports and logs to pinpoint any potential problems.

See What the Search Engines Are Reporting

The first step in identifying crawl issues is to check the webmaster tools provided by the search engines. For example, in Google Search Console, the “Crawl Stats” report can help you determine if there have been any changes in how Googlebot is crawling your site. This report will show you if Googlebot has increased or decreased its crawling frequency due to any issues on your website.

Additionally, the “Page Indexing” report is invaluable. It provides a ratio between your site’s indexed and unindexed pages. If you notice any pages marked as “Discovered – currently not indexed,” this can be a sign that certain pages on your site are not meeting Google’s crawling criteria, which could be linked to crawl budget limitations.

Dig Deeper: Decoding Googlebot Crawl Stats Data

To gain a deeper understanding of potential crawl issues, you can dig into Googlebot’s crawl stats data in Search Console. This detailed information can give you a clearer picture of how Googlebot interacts with your website.

Log Files

Another useful method to identify crawl budget problems is by analysing your website’s log files. These files track all human and bot visits to your site. By reviewing them, you can pinpoint which pages haven’t been crawled by search bots for an extended period.

If these uncrawled pages are new or have been updated regularly, it could indicate that your site is facing crawl budget issues. This suggests that Googlebot may not be able to access and index all the important content on your site, which could affect its search visibility.

How to fix crawl budget problems

Before Fixing a Crawl Budget Issue, Confirm You Have One

Before attempting to resolve a crawl budget issue, ensure that one actually exists. Some of the suggestions I’m about to make are useful practices to help search bots focus on the pages you want them to crawl. However, others are more substantial and could negatively impact your site’s crawling if applied incorrectly.

Another Word of Warning

It’s crucial to differentiate between crawling and indexing issues before implementing any changes. I’ve encountered many situations where pages were already indexed, and someone wanted to remove them, so they blocked the crawling of those pages. However, blocking crawling doesn’t remove pages from the index – at least not quickly. To make matters worse, some people add a noindex meta tag to these pages, which they’ve already blocked in the robots.txt file.

The problem here is that if crawling is blocked, search bots can’t access the page to see the noindex tag, rendering the effort ineffective. To avoid such confusion, it’s essential not to mix crawling and indexing solutions. Identify whether the problem lies with crawling or indexing, and address it directly.

Fixing Crawl Budget Issues through the Robots.txt

The robots.txt file is a powerful tool for guiding search bots on which pages they should and shouldn’t crawl. By using the “disallow” command, you can prevent bots from crawling URLs that match specific instructions. However, note that bad bots, such as competitors scraping your site, may ignore the disallow command. In this case, you may need to block them using other methods.

Before making any changes to your robots.txt file, check if it is blocking URLs that you want search bots to crawl. Using tools like the robots.txt tester from Dentsu can help you ensure that your file is configured properly.

Improving the Quality and Load Speed of Pages

If search bots are struggling to navigate your site, improving page load speed can help. Load speed is important for crawling, as it affects both the time it takes for the server to respond to a bot’s request and how long it takes for a page to render. If some of your pages aren’t being crawled regularly, test the templates on those URLs and see if they load slowly.

Another factor contributing to a lack of crawling might be page quality. Audit the pages that aren’t being crawled, especially those that share the same sub-folder as crawled pages. Make sure that the content isn’t thin, duplicated, or spammy.

Control Crawling Through Robots.txt

You can stop search bots from crawling specific pages or entire folders by modifying the robots.txt file. Using the “disallow” command gives you the flexibility to control which parts of your website search bots visit. For instance, you may not want bots to waste crawl budget on filtered category pages with sorting or filtering parameters in the URL, such as “?sort=” or “?content=.”

Consider Nofollow Links on Internal Links

Another strategy to control crawling is to add the “nofollow” attribute to internal links. For example, in the earlier events calendar example, each “View next month’s events” link could be given the “nofollow” attribute. This would allow human visitors to click the link while preventing search bots from following it.

Make sure to add the “nofollow” attribute to all relevant links across your site. If you don’t, or if a link is added to a deeper page in the events calendar by another site, bots might still crawl that page.

Navigating crawl budget for SEO success in 2025

Most websites won’t need to worry about their crawl budget or whether bots can access all the pages within the allocated time and resources. However, this doesn’t mean you should ignore how bots are crawling your site.

Even if you’re not running out of crawl budget, there could still be issues that prevent search bots from crawling certain pages. Alternatively, you may be inadvertently allowing bots to crawl pages that you don’t want them to.

It’s essential to monitor the crawling of your site as part of its overall technical health. By doing so, if any issues arise that might hinder bots from crawling your content, you can address them promptly and avoid potential problems.