Blocked URLs Can Still Be Indexed

John Mueller has responded to a common concern seen in Google Search Console where large numbers of URLs appear as “Indexed, though blocked by robots.txt”, helping clarify what this actually means and whether it requires action.

Large volumes of “indexed but blocked” URLs

The discussion began with a WooCommerce site owner who noticed over 51,000 URLs being flagged in Search Console under this status. These URLs were mainly generated through shopping cart parameters such as ?add-to-cart= and similar variations.

The site owner explained that they had implemented robots.txt rules to block these parameter-based URLs, but the issue appeared unexpectedly in recent weeks. This led to concerns that the robots.txt configuration might be causing unintended indexing behaviour or harming SEO performance.

They also questioned whether removing the robots.txt rules or adding noindex directives would be a better solution, particularly given the size of the site and concerns about crawl budget and potential side effects.

Are add-to-cart URLs meant to be indexed?

John Mueller clarified that URLs such as add-to-cart parameters do not need to be indexed at all. In his view, blocking them via robots.txt is an acceptable and common approach for handling this type of content.

He also explained that even if Search Console reports these URLs as indexed, they are not typically shown in normal search results. The reason is that these pages are not useful for search users and are generally not intended to be surfaced in Google Search.

Mueller’s response was as follows:

“You don’t need the add-to-cart URLs indexed. Blocking them with robots.txt is fine. Even if they get ‘indexed’ since they’re blocked by robots.txt, it’s unlikely that they’ll be shown in search (unless you do specific queries for those URLs, which users don’t do).”

How robots.txt affects crawling and indexing

A key part of the discussion is understanding what robots.txt actually does. It prevents Googlebot from crawling certain URLs, but it does not directly prevent Google from indexing a URL if it is discovered through other means, such as internal links.

This is why Search Console can still show URLs as indexed even when they are blocked from crawling. Google may know the URL exists, but it cannot access the content behind it.

This distinction often leads to confusion, as many assume blocking a URL in robots.txt automatically removes it from Google’s index, which is not technically the case.

Does “noindex” solve the problem?

One suggestion raised in the original discussion was to use a noindex meta tag for the parameterised URLs instead of relying on robots.txt alone.

However, this approach is not always practical in platforms like WooCommerce. These parameter URLs are often generated dynamically and may share the same templates as standard product pages. As a result, applying a noindex directive selectively can be difficult without affecting the main pages that should remain indexed.

In many cases, there is also no separate page content to manage, as these URLs do not represent unique landing pages in the traditional sense.

Why Google is discovering these URLs

Another likely explanation for the Search Console reports is internal linking. If add-to-cart parameters are included in links anywhere across the site—whether in product listings, buttons, or templates—Google may discover and record them even if they are blocked from crawling.

Once discovered, these URLs can be reported in Search Console, contributing to the “indexed though blocked” status.

Because of this, some SEO professionals recommend auditing internal links using crawling tools to identify where such URLs are being generated. In some cases, adjusting or removing these links, or adding a rel=”nofollow” attribute, can help reduce unnecessary discovery.

Should site owners be worried?

A key takeaway from Mueller’s explanation is that Search Console reports do not always reflect a real SEO problem. Instead, they often highlight technical conditions that look alarming but have little or no impact on search visibility.

In this scenario, add-to-cart URLs being blocked by robots.txt is generally considered an appropriate setup. These URLs are not meant for indexing, and blocking them is a standard way of preventing crawl waste.

However, it is still useful to understand how they are being discovered, especially on large e-commerce sites where parameter-based URLs can multiply quickly.

When additional steps may help

While robots.txt blocking is usually sufficient, site owners who want to tighten control may also review internal linking structures. Ensuring that unnecessary parameter links are not widely exposed can help reduce how often Google encounters them.

Where appropriate, adding a rel=”nofollow” attribute to links that generate these URLs can provide an additional signal to search engines, although it is important to remember that this is a hint rather than a strict directive.

Final thoughts

Mueller’s comments reinforce an important point in technical SEO: not every Search Console warning indicates a problem that needs fixing. Some reports simply reflect how Google discovers and categorises URLs, rather than issues affecting rankings or traffic.

In the case of add-to-cart URLs, blocking them via robots.txt remains a valid and widely used approach. The key is understanding what robots.txt does—and does not do—and focusing on whether those URLs are actually impacting users or search performance in a meaningful way.