As large AI language models race to dominate the internet, tensions are rising between established online platforms and new AI startups. The latest clash involves Reddit and Anthropic, an AI company backed by Amazon and Google.
Reddit Sues Anthropic Over User Data Scraping
Reddit has filed a lawsuit accusing Anthropic of scraping personal data from its users without permission. According to Reddit, Anthropic used this data to train its AI language model, Claude.
The complaint criticizes Anthropic for claiming to be a “white knight” in the AI industry, calling this portrayal misleading. Reddit also alleges that Anthropic continued aggressive data scraping on its servers, making over 100,000 requests despite being asked to stop.
Anthropic responded by rejecting Reddit’s claims and promised to defend itself strongly.
Anthropic Faces Other Copyright Lawsuits
Anthropic is also involved in other legal disputes. Music publishers, including Universal Music Group, ABKCO, and Concord, have sued Anthropic. They allege that Anthropic violated copyrights by using songs from artists like Beyoncé and the Rolling Stones to train its AI model Crowder.
The Growing Legal Battle Over AI Training Data
The Reddit-Anthropic lawsuit is part of a larger wave of legal cases where copyright owners try to protect their content from AI companies. The key question is whether AI firms can use copyrighted material to train models without getting permission.
Courts have yet to fully decide this. But in February, a Delaware court sided with Thomson Reuters against Ross Intelligence, ruling that Ross illegally used Thomson Reuters’ data to train AI. The court rejected Ross’s claim of fair use, which normally allows limited use of copyrighted material for teaching, research, or commentary.
Fair Use Debate at the Center of AI Lawsuits
OpenAI, creator of ChatGPT, is a major player in these disputes. The company, backed by Microsoft, has been sued by comedian Sarah Silverman and the parenting site Mumsnet. Both claim their content was used without consent to train AI models.
The New York Times also filed a lawsuit in 2023, accusing OpenAI and Microsoft of illegally using millions of its articles to train ChatGPT. The newspaper says the AI sometimes reproduces content closely resembling its original stories.
OpenAI has called the lawsuit baseless and is appealing a judge’s order requiring it to keep ChatGPT’s output data.
OpenAI and Microsoft argue that the use of public content by the New York Times falls under fair use and is therefore legal.
Similar Copyright Battles in AI Image Generation
Getty Images has filed lawsuits against AI image startup Stability. The cases in the US and UK argue over whether Stability’s training of its model, Stable Diffusion, on Getty’s copyrighted images is allowed under fair use or “fair dealing.”
Google’s Long History with Fair Use in Copyright Disputes
Google has defended itself for years against copyright claims related to its search engine. In 2005, the Authors Guild sued Google for scanning millions of books and showing snippets in search results without payment.
The courts ruled Google’s actions were “transformative” and protected by fair use law.
In 2016, Getty Images sued Google for displaying high-resolution images in search results. Getty argued this encouraged piracy and hurt its business.
Google and Getty settled before trial. Google agreed to better display copyright information, form a licensing deal with Getty, and remove a “View Image” button that made image downloads easier.
Google’s AI Model Faces New Antitrust Scrutiny
Google may face a new data scraping conflict as part of an antitrust case won by the U.S. Justice Department. The government warned that Google could strengthen its search monopoly by training its AI, Gemini, using its vast internet index.