Google’s Advice on Fixing Unwanted Indexed URLs

Google’s John Mueller recently shared insights on managing duplicate URLs with query parameters in Google’s index, offering a different approach to a common SEO issue.

During a site audit, an SEO professional raised concerns about the use of the rel=canonical tag for controlling indexed pages. The SEO suggested that the best course of action would be to apply a noindex directive to remove the pages from Google’s index and then block each specific URL using the robots.txt file.

While this might seem like a logical approach to prevent unwanted URLs from appearing in search results, John Mueller advised against it. Instead, he recommended a different strategy that aligns more closely with Google’s indexing and crawling best practices. His advice highlights the importance of understanding how Google processes duplicate content and the most effective ways to manage it without causing unintended issues with search visibility.

Site Audit Reveals Indexed Add To Cart URLs

An SEO audit revealed that more than half of the client’s 1,430 indexed pages consisted of paginated URLs and “add to shopping cart” URLs—those with query parameters at the end. Despite the presence of rel=canonical link attributes, Google still indexed these pages. This situation served as a clear reminder that rel=canonical is treated as a hint rather than a directive.

In this case, “paginated” referred to dynamically generated URLs created when a visitor sorted products by brand, size, or other filters. This process, commonly known as faceted navigation, often results in multiple URLs leading to similar content.

The “add to shopping cart” URLs followed a structure like this:

example.com/product/page-5/?add-to-cart=example

To address the issue, the client had already implemented a rel=canonical tag, signalling to Google which URL should be considered the preferred version. However, since Google continued to index the unwanted pages, the SEO proposed a more direct approach.

Their suggested solution involved applying a noindex directive to the affected pages first. Once Google removed these URLs from its index, they would then be blocked using the robots.txt file to prevent further crawling. This method aimed to ensure that Google no longer indexed unnecessary URLs while maintaining control over how search engines processed the site’s content.

SEO Decisions Depend On Details

One of the most overused jokes in the SEO world is the phrase, “It depends.” However, when followed by an explanation of what exactly something depends on, it becomes a valid and insightful response. This was precisely the case in a LinkedIn discussion where John Mueller weighed in, adding crucial details to an ongoing debate that had already attracted 83 responses.

The discussion was started by an SEO professional who had just completed an audit and was addressing the technical challenges of managing what gets crawled and indexed by Google. A key point raised was the misconception that rel=canonical is a foolproof solution for controlling indexation. In reality, it is merely a suggestion rather than a strict directive.

In SEO terms, a directive is an instruction that search engines are required to follow, such as a meta noindex tag that prevents a page from being indexed. In contrast, a rel=canonical tag is treated as a hint, meaning Google can choose to ignore it based on various factors.

The primary issue highlighted in the original post was the difficulty of managing a large number of dynamically generated pages that were unintentionally making their way into Google’s index. This challenge is common for websites that generate unique URLs based on user filters, search queries, or product variations, leading to index bloat and potential SEO inefficiencies.

John Mueller On Dealing With Unwanted Indexed URLs

John Mueller addressed the issue by emphasising the need to analyse URL patterns to identify why unwanted pages were being indexed. Rather than applying a generic fix, he recommended a more tailored approach suited to the specific website.

Offering his insights, Mueller explained:

“You seem to have a lot of comments here already, so my 2 cents are more as a random bystander…

I’d review the URLs for patterns and look at specifics, rather than treat this as a random list of URLs that you want canonicalised. These are not random; using a generic solution won’t be optimal for any site – ideally, you’d do something specific for this particular situation. Aka ‘it depends.’
In particular, you seem to have a lot of ‘add to cart’ URLs – you can just block these using the URL pattern via robots.txt. You don’t need to canonicalise them, as they shouldn’t be crawled in a normal crawl. Crawling these pages can also distort your site’s metrics.
There’s some amount of pagination and filtering happening through URL parameters as well – check out our documentation for different ways to handle this.
For more technical deep dives, check out https://search-off-the-record.libsyn.com/handling-dupes-same-same-or-different.”

Mueller’s response highlighted the importance of looking at the root cause of indexation issues rather than applying a one-size-fits-all approach. His advice reinforced the need for site owners to be strategic in their technical SEO decisions, particularly when dealing with duplicate or unnecessary URLs.

Why Was Google Indexing URLs With Query Parameters?

In the LinkedIn discussion, several participants brought up the recurring issue of Google indexing shopping cart URLs, specifically “add to shopping cart” URLs. This has been a concern for many website owners, as these URLs serve no real purpose in search engine results and can clutter a site’s index.

Despite the discussion highlighting this problem, no definitive solutions were provided. The root cause could be linked to the specific shopping cart platform in use, meaning that addressing the issue may require platform-specific adjustments. As mentioned earlier, possible solutions include blocking these URLs via robots.txt or applying other technical fixes to prevent them from being indexed.