Google on Diagnosing Multi-Domain Crawling Issues

A Google Search Advocate has provided guidance on how to diagnose crawling issues, emphasising the importance of examining shared infrastructure when multiple domains are impacted.

Key Points:

When Google halts crawling across several domains, shared infrastructure, such as Content Delivery Networks (CDNs), is often the underlying issue.
Data from Search Console can be valuable in identifying whether the problem stems from DNS issues or failed requests.
Minor crawling disruptions that last only a few hours typically do not influence search rankings.

John Mueller, Google’s Search Advocate, recently provided valuable insights into diagnosing widespread crawling issues that can affect multiple domains. His advice became particularly relevant when Adrian Schmidt shared his experience on LinkedIn, where he reported a significant disruption. Schmidt discovered that Google’s crawler had ceased to access several of his domains simultaneously, raising concerns about the underlying causes of this issue.

In the face of this interruption, Schmidt took proactive steps to assess the situation and found that live tests conducted through Google Search Console continued to operate without any error messages. This finding was encouraging, as it indicated that while the crawler was experiencing difficulties, other aspects of the website were still functioning normally. This suggests that the problem may not have been related to the site’s overall health or content.

To delve deeper into the problem, Schmidt conducted a thorough investigation, which revealed no significant increase in 5xx errors. These errors typically indicate server-related issues that can prevent a site from being properly accessed by crawlers. Additionally, there were no complications with robots.txt requests, which guide crawlers on how to interact with different parts of a website. These findings are crucial, as they help narrow down potential causes of the crawling disruption.

Mueller’s insights underscore the importance of examining various factors when diagnosing crawling issues. It’s essential for webmasters to consider that even when crawling problems arise, it doesn’t necessarily indicate an issue with their site’s configuration or performance. By systematically assessing the situation, webmasters can better understand the root causes of such disruptions.

Overall, the dialogue between Schmidt and Mueller highlights the collaborative effort between webmasters and Google to address crawling challenges. It serves as a reminder that while crawling issues can be frustrating, thorough investigation and communication can lead to a clearer understanding of the factors at play and ultimately help in resolving these problems more effectively.

What could the problem be?

Mueller’s Response
In response to the situation, Mueller identified shared infrastructure as a likely culprit. He stated, “If it’s shared across multiple domains and affects crawling, it’s probably an issue with a shared piece of infrastructure. If things are starting to recover, then it’s not urgent anymore, and you have some time to investigate recent changes or look through infrastructure logs.”

Infrastructure Investigation

All the affected sites were using Cloudflare as their Content Delivery Network (CDN), which raised some questions among observers.

When it came to debugging the issue, Mueller advised checking the Search Console data to identify whether the problem stemmed from DNS issues or failed requests. He mentioned, “The crawl stats in Search Console will provide additional information and can help distinguish between DNS problems and failing requests.”

Mueller also highlighted that timing could be an important factor in diagnosing the issue. He noted, “If everything stopped at the same time, it’s unlikely to be related to robots.txt, and probably not DNS either.”

Impact on Search Results

Addressing concerns about search visibility, Mueller assured that this kind of disruption would not lead to any significant problems. He stated, “If this happened today and lasted only a few hours, I wouldn’t expect to see any visible issues in search results.”

Why This Matters

When Googlebot suddenly stops crawling multiple sites at the same time, it can be difficult to pinpoint the cause.

Although temporary pauses in crawling may not have an immediate effect on search rankings, they can hinder Google’s ability to find and index new content.

This incident underscores a potential vulnerability that organisations may face without being aware, particularly those that depend on shared infrastructure.