Apple Faces Second copyright Lawsuit Over AI Training Data
Table of Contents
Apple is facing a second proposed class action lawsuit alleging copyright infringement related too teh data used to train its artificial intelligence (AI) models.The lawsuit, filed by neuroscience professors Susana Martinez-Conde adn Stephen Macknik from SUNY Downstate Health Sciences University, claims apple utilized their copyrighted works without permission. This follows a similar suit filed earlier this year, and reflects a growing trend of legal challenges against tech companies employing large language models (LLMs).
Martinez-Conde and Macknik allege that Apple trained its AI using data obtained from “shadow libraries” and web-crawling software – sources known to host pirated copyrighted books. Specifically,they claim two of their own books were used without authorization in the training process. https://news.bloomberglaw.com/privacy-and-data-security/apple-accused-of-ai-copyright-infringement-by-suny-professors
This lawsuit mirrors a previous class action filed against Apple, where authors alleged similar copyright violations concerning the training of Apple intelligence models. The core issue revolves around weather using copyrighted material to train an AI constitutes copyright infringement, even if the AI doesn’t directly reproduce the copyrighted work in its output.
The Broader Context: AI and Copyright Litigation
Apple isn’t alone in facing these legal challenges. OpenAI, the creator of ChatGPT, is currently being sued by The New York Times for similar accusations of copyright infringement. https://www.nytimes.com/2023/12/27/business/new-york-times-openai-microsoft-lawsuit.html These cases highlight the complex legal questions arising from the rapid advancement of AI and its reliance on massive datasets.
The debate centers on the concept of “fair use” – a legal doctrine that permits limited use of copyrighted material without permission from the copyright holder. AI developers argue that training AI models falls under fair use, as it transforms the original material into something new. Though, copyright holders contend that this use is commercial and harms the market for their work.
recent Precedent: Anthropic’s Settlement
A significant development in this area occurred earlier in 2024 when Anthropic, another AI company, settled a class action lawsuit for $1.5 billion. The suit, brought by over 500,000 authors, also centered on copyright claims related to AI training data. https://www.reuters.com/legal/anthropic-agrees-15-bln-settlement-authors-copyright-case-2024-02-29/ This settlement, while not a definitive legal ruling, sets a precedent for potential financial liabilities for AI companies that utilize copyrighted material without proper licensing.
Understanding AI Training and Copyright
AI models, notably LLMs, learn by analyzing vast amounts of text and code. This process, known as “training,” involves identifying patterns and relationships within the data. The more data an AI model is trained on, the better it generally performs.Though, much of the data available on the internet is copyrighted.
The legal question is whether simply accessing and analyzing copyrighted material for training purposes constitutes infringement. Some legal scholars argue that it does not, comparing it to a researcher reading books for background knowlege. Others argue that it does, as the AI is essentially creating a derivative work based on the copyrighted material.
What’s Next?
These lawsuits are likely to shape the future of AI development and copyright law. The outcomes will determine the extent to which AI companies can utilize copyrighted material for training purposes, and whether copyright holders will be able to effectively protect their work in the age of AI. The legal landscape is rapidly evolving, and further rulings and settlements are expected as more cases make their way through the courts. The resolution of these disputes will be crucial for fostering innovation while respecting intellectual property rights.