Scraping: Reddit vs Anthropic

This is one of the first legal cases where a digital platform owner formally challenges the commercial use of user-posted data without licensing or compensation. The lawsuit raises fundamental questions about who owns online content and under what conditions it may be used for machine learning.

It’s important to understand that Reddit’s lawsuit against Anthropic doesn’t primarily concern unauthorized access to personal data, but rather the unlawful use of public user content. In this context, "scraping" refers specifically to publicly posted materials (such as texts, comments, etc.) created by Reddit users, and not necessarily to the processing of personal data.

What is Scraping?

Scraping is the automated process of extracting data from websites using specialized software tools. This method typically involves large-scale automated queries, which can place a significant load on server infrastructure.

In this case, Anthropic used scraping to gather a large volume of data from Reddit to train its language model, Claude. The core issue is that the data was not used for mere research purposes but to create a commercially successful product that has already attracted interest from investors and tech companies.

Legal and Regulatory Risks

Reddit presents five main legal arguments against Anthropic:

Violation of the User Agreement – Reddit’s terms of service explicitly prohibit the automated collection and commercial use of data without special permission. Anthropic used the platform's content to train its AI system, which was then offered to clients commercially.
Unjust Enrichment – Anthropic derived significant benefit by using data Reddit monetizes through licensing agreements with other AI companies, without paying for it.
Trespass to Chattels – The scraping bots created excessive load on Reddit’s servers, potentially degrading service quality for regular users.
Interference with Contractual Relationships – Despite being aware of Reddit’s confidentiality obligations to its users, Anthropic knowingly violated them by continuing data collection after receiving official notifications.
Unfair Competition – Although Anthropic claimed it ceased scraping in response to Reddit's demands, the company continued unauthorized data collection, harming Reddit’s reputation.

Ethical and Privacy Issues

Scraping raises serious concerns over user control of content—deleted posts may still remain in AI training datasets. Unlike licensed partnerships with OpenAI and Google, which use APIs that allow content removal synchronization, illegal scraping deprives users of this control. This introduces issues of digital content rights and potential processing of personal information if it was part of the scraped material. Ethical behavior by AI developers is becoming increasingly crucial—respecting content usage terms, honoring user choices, and maintaining transparency in data collection practices.

Historical Precedent: HiQ vs LinkedIn

The case of HiQ Labs vs LinkedIn (2017–2022) set important legal standards for data scraping. HiQ, which analyzed workforce trends, routinely collected public LinkedIn profiles for its analytics services.

The legal positions that emerged from that case affirmed that scraping is permissible in the absence of explicit prohibitions and significant harm to the platform. They also established that the CFAA (Computer Fraud and Abuse Act) cannot be used as a catch-all tool against scraping of publicly accessible data.

The Uniqueness of This Case

This lawsuit represents a fundamentally new approach to regulating scraping in the context of training AI models. Unlike past lawsuits based on the CFAA, Reddit is building its case on the violation of licensing agreements and the unlawful commercial use of data. Reddit emphasizes that Anthropic knowingly ignored technical restrictions (like robots.txt) and continued unauthorized data collection even after official warnings.

The absence of CFAA references highlights an evolution in legal tactics: instead of focusing on unauthorized access to systems (as in HiQ vs LinkedIn), Reddit centers its case on protecting economic interests and enforcing service terms and confidentiality. This approach could establish a new legal precedent governing relationships between data-owning platforms and AI developers.

Conclusion

The Reddit vs Anthropic lawsuit contributes to the emerging legal framework protecting digital content in the age of AI. There are already precedents where courts ruled against the unauthorized use of data for training AI models. Licensing agreements are increasingly becoming a mandatory aspect of working with training data, as illustrated by Reddit’s deals with OpenAI and Google.

The outcome of Reddit’s case against Anthropic could become a pivotal precedent in AI regulation, helping to establish a new balance between innovation in machine learning and the rights of content owners.

This isn’t just a dispute over scraping—it’s a debate over the future of AI governance and digital rights in an era where data is the core fuel of technological progress.

Scraping: Reddit vs Anthropic

Contact our team

Thank you!