AI Training Is Fair Use, Says Google

Google has set out a clear defence of its approach to training AI models, stating that using publicly available web content should continue to fall under fair use protections. The company also highlights opt-out tools and potential commercial partnerships as part of how it intends to balance the needs of publishers, regulators, and AI development.

The position arrives at a time when pressure is increasing from multiple sides. Regulators, publishers, and industry bodies are all pushing for stronger rules around how AI systems access and use online content. Many are calling for clearer attribution, permission-based scraping, and in some cases financial compensation for material used in training datasets.

In response, Google has published a policy document titled “A Pragmatic Approach to AI Governance in America”, which outlines its interpretation of current copyright rules in the age of artificial intelligence. Within this paper, the company argues that training AI systems on publicly available web data should be considered a “transformative, non-expressive use”. On this basis, Google believes it should remain protected under fair use law in the United States.

This position essentially reinforces what Google has previously suggested, but in a more structured policy format. It brings together several ongoing arguments the company has made about how AI development should be governed, particularly in relation to content found freely on the open web.

A key part of Google’s argument is that AI training does not directly reproduce or substitute original content in a traditional sense. Instead, it is framed as a process of learning patterns from large volumes of data. To explain this, Google uses an analogy comparing AI training to an art student walking through a gallery and absorbing inspiration from multiple works, rather than copying any single piece outright.

Alongside this, Google places significant emphasis on opt-out mechanisms. It points to existing tools such as machine-readable instructions in robots.txt files, including the Google-Extended directive, which allows website owners to signal that they do not want their content used in AI training. According to Google, these controls provide publishers with a practical way to manage how their content is accessed.

The paper also suggests that when content is removed or flagged as problematic in AI outputs, the appropriate solution should rely on established copyright enforcement processes. Rather than introducing new automated systems to judge whether AI outputs are too similar to original works, Google points to traditional notice-and-takedown procedures as the preferred route.

At the same time, Google acknowledges that the AI ecosystem may evolve towards more structured value-sharing arrangements. The company notes that it is exploring partnerships with publishers whose content helps improve the accuracy, reliability, and freshness of AI-generated responses. It also references the possibility of paying for access to specialised or non-public datasets. However, no specific programmes, commercial terms, or timelines have been outlined, leaving these ideas largely theoretical for now.

The timing of the paper is significant, as it comes during a period of heightened scrutiny from regulators and policymakers. In the UK, for example, the Competition and Markets Authority (CMA) has introduced new requirements that allow publishers to opt out of certain AI search features and demand clearer attribution. These changes are intended to strengthen the bargaining position of publishers as AI systems become more deeply integrated into search.

Google has already begun testing opt-out mechanisms in response, although feedback from publishers suggests that the current reporting tools do not yet provide enough detail to fully assess the impact. In particular, there are concerns that key performance data, such as click-through rates, is not being made available.

In the United States, opposition to opt-out-based systems is also growing. Digital Content Next, a major industry group, recently sent a cease-and-desist letter to the Common Crawl Foundation. It argued that copyright law should not function on an opt-out basis and that permission should be required before content is scraped or used.

This reflects a broader philosophical divide between tech companies and publishers. Google’s position relies heavily on existing legal frameworks and opt-out tools, while publishers and regulators increasingly advocate for permission-first systems and greater transparency over how content is used and monetised in AI training.

Critics of Google’s approach argue that opt-out mechanisms alone may not provide sufficient control, particularly for smaller publishers who lack the resources or technical understanding to implement them effectively. Others suggest that the current model may still favour large platforms and established publishers, rather than creating a level playing field.

Supporters of Google’s stance, however, argue that overly restrictive rules could slow down innovation and limit the development of useful AI systems that depend on large-scale data access. They also point out that fair use has historically played an important role in enabling technological progress in areas such as search and indexing.

Ultimately, the governance paper represents Google’s current policy direction rather than a final or legally binding framework. While it hints at future collaboration models and revenue-sharing possibilities, the details remain open-ended.

As regulators continue to shape new rules and publishers push for greater control, the debate over AI training and fair use is unlikely to settle any time soon. For now, Google’s position is clear: publicly available web data should remain available for AI training under fair use, supported by opt-out tools and evolving partnership models, rather than strict permission-based restrictions.