Perplexity Responds to Reddit Lawsuit Over Alleged Data Access Breach
Reddit has filed a lawsuit against Perplexity AI and several data-scraping companies, accusing them of unlawfully obtaining Reddit’s content. The legal action was launched in a New York federal court and alleges that Perplexity, along with its partners, accessed Reddit’s material on a large scale by bypassing access controls and scraping data through Google search results.
The dispute centres on how Perplexity uses Reddit content to power its AI tools. According to Reddit, the companies involved worked together to collect and use Reddit’s data without permission. Reddit’s complaint specifically names Oxylabs UAB, AWMProxy, and SerpApi as third parties that helped Perplexity access protected information.
Perplexity, however, has publicly denied the accusations. In a statement, the company said that it only summarises discussions from Reddit and includes clear citations to the original posts. The company also emphasised that it does not use Reddit content to train its AI models, distancing itself from claims of large-scale data scraping.
This defence aligns with Perplexity’s previous explanations about how its system gathers information from public web sources. However, there remains uncertainty about whether this approach fully addresses the detailed allegations outlined in Reddit’s legal complaint. Critics argue that summarising and training are two separate processes, but both may still rely on similar access to underlying data.
The complaint from Reddit provides examples of alleged unauthorised access. One piece of evidence mentions that Reddit created a “test post” visible only to Google’s search engine and inaccessible anywhere else online. Surprisingly, within hours, that hidden post appeared in Perplexity’s search results, suggesting that its system—or one of its partners—found a way to retrieve the restricted content.
Reddit also claims that after issuing a cease-and-desist notice, Perplexity’s references to Reddit threads did not decline but instead increased dramatically—reportedly by nearly forty times. This unexpected surge has raised questions about whether the company adjusted its scraping methods rather than halting them altogether.
In addition to this lawsuit, other publishers have accused Perplexity of similar behaviour. Forbes previously alleged that Perplexity republished one of its exclusive reports without permission and threatened legal action. Around the same time, Wired published an investigation claiming that Perplexity had used hidden IP addresses and disguised user-agent strings to bypass website protections designed to stop automated data collection.
Further support for these claims came from Cloudflare, which said that its tests revealed Perplexity used undeclared web crawlers that ignored “no-crawl” instructions. These findings have deepened scrutiny of the company’s technical practices and raised larger concerns about how AI firms access and use data across the web.
Perplexity has responded to similar controversies in the past by suggesting that these problems stemmed from technical limitations in its early-stage products. The company promised to introduce better systems for transparency and proper content attribution, saying that it aims to respect publishers while helping users access useful summaries of information online.
At the same time, Perplexity has accused some media companies of trying to control what it calls “publicly reported facts.” The company argues that much of the information it references is already in the public domain and that AI tools should be allowed to use it fairly for research and summarisation purposes.
In its most recent statement regarding Reddit’s lawsuit, Perplexity framed the issue as a broader dispute over data access and training rights. The company said, “We summarise Reddit discussions… We won’t be extorted, and we won’t help Reddit extort Google.” This strong wording suggests that Perplexity views Reddit’s lawsuit as part of ongoing negotiations over how AI models can legally use online content.
The case carries wider implications for the technology and media industries. It touches on key questions about whether summarising online posts counts as fair use or if it infringes upon the original creator’s rights. If the courts decide that Reddit’s claim of “anti-circumvention” holds, it could set a precedent limiting how AI platforms gather and cite online discussions.
However, if the court rules in favour of Perplexity, the decision could reinforce the idea that summarisation and citation are acceptable under existing laws, potentially giving AI developers greater flexibility to use information from forums, news outlets, and other online sources. This outcome could shape how future AI tools are trained and how they interact with publicly accessible content.
Beyond the legal details, the case raises important ethical and commercial concerns. Platforms like Reddit host millions of user-generated posts that reflect genuine human conversations, making them valuable for AI development. Yet, balancing innovation with data privacy and intellectual property rights continues to be a major challenge for the tech industry.
At present, it remains unclear which specific firm provided Reddit’s data to Perplexity. The public complaint does not detail the financial transactions or identify which company within the group—Oxylabs, AWMProxy, or SerpApi—was responsible for the data transfer. Until more evidence surfaces, the case leaves several key questions unanswered about the scale and method of Perplexity’s alleged data collection.
More Digital Marketing BLOGS here:
Local SEO 2024 – How To Get More Local Business Calls
3 Strategies To Grow Your Business
Is Google Effective for Lead Generation?
How To Get More Customers On Facebook Without Spending Money
How Do I Get Clients Fast On Facebook?
How Do You Use Retargeting In Marketing?
How To Get Clients From Facebook Groups
