So What If My AI Bot Wrote This Paper!? [Full Article]
Copyright Protections Under Attack from Generative AI.
[AI-generated image: Bob Ross was famous for saying “anyone could paint,” but he didn’t say anyone could own the copyright of their painting.]
As generative AI (GAI) continues to advance and blur the lines between human and machine creativity, we are forced to confront these fundamental questions: is expression unique to humans, and are there limits to the protections of human expression? As human expression becomes increasingly intertwined with technology, these questions will be stress-tested like never before, challenging our understanding of what it means to be creative, who deserves protection under the law, and whether the AI output of human expression will trip up copyright laws.
In this article, I’ll dive into copyrights, the exemptions, and the lawsuits AI companies face for supposed copyright infringement.
I. How Does Copyright Work?
Copyright is a legal right established by the US Constitution that protects original creative works once they are physically created or documented. This protection applies to both published and unpublished works. It covers a variety of creative expressions, including literature, drama, music, art, films, software, and architecture. However, copyright does not cover basic facts, ideas, or methods; it focuses on how these elements are expressed creatively.
A copyright is exclusively a human right. The US Copyright Office (USCO) has issued guidance that anything generated by AI is not copyrightable except the parts not generated by AI -- to create is to be human. Using copyrighted works without the copyright owner's permission violates her or his exclusive rights to reproduce and distribute these works.
[AI-generated image: sadly, robots can’t hold copyrights.]
II. What Happens If Someone Violates Your Copyright?
You can defend your copyright by suing the violator in federal court; infringement could lead to penalties. The penalty can be up to $30,000 for each copyrighted work misused. However, if it's proven that the infringement was done on purpose, this amount could go up to $150,000 for each work involved. It could even lead to a criminal investigation by the US Attorney's office if someone intentionally uses your copyrighted work for their own profit.
III. Are There Exceptions So I Don’t Need to Ask for Permission?
While the right to copyright is robust, it is not absolute. There are exceptions to this right. Educational and library uses, public performances, the first sale doctrine, accessibility provisions for disabilities, face-to-face teaching, and fair use are examples of copyright exceptions.
Among these exceptions, fair use is worth exploring as applied to AI-generated content. An affirmative defense against copyright infringement, fair use supports freedom of expression by allowing certain uses of protected works without permission in specific contexts. Defined under Section 107 of the Copyright Act, fair use covers activities like criticism, teaching, comment, news reporting, scholarship, and research.
Fair use is determined on a case-by-case basis and involves a court considering four specific factors: (1) the purpose and character of the use, such as whether it's for commercial or educational purposes; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used in relation to the entire copyrighted work; and (4) the effect of the use on the potential market value of the copyrighted work.
As a generalization, if (i) you borrow a small portion of and even transform the original copyrighted work, (ii) that original work is more factual, less creative, (iii) you use it for noncommercial purposes, and (iv) using that work doesn't really hurt the copyright owner's original work or market, you likely can rely on Fair Use. "Transformative uses" mean adding something fresh with a different purpose from the original content, bringing a new expression, meaning, or message to the work.
IV. Actual Examples of Fair Use Defense Failing or Winning
Fail: Shepard Fairey’s Obama Hope Poster Was Not Transformative Enough
[Left: photograph taken by Mannie Garcia. Right: Shepard Fairey “Hope” Poster”. Images pulled from the New York Times article linked here]
The street artist Shepard Fairey created his artwork using a photograph of Barack Obama by Mannie García, which he did not initially credit. When the photograph's origin became known, Fairey claimed his work was protected under fair use. However, the court disagreed, ruling that Fairey's adaptation was not sufficiently transformative and did not qualify as fair use. The lack of original attribution to Garcia also weakened any argument that the poster was meant as commentary or critique of the original photo.
Win: Leslie Nielsen Spoofing a Vanity Fair Photo of Demi Moore
In 1991, Annie Leibovitz photographed a pregnant Demi Moore for a Vanity Fair cover, creating a widely recognized image. Two years later, for the promotion of "Naked Gun 33 1/3: The Final Insult," Paramount Pictures parodied this photo by superimposing Leslie Nielsen's face onto the body of a pregnant woman, mimicking Moore's pose, lighting, and background, but with exaggerated features like garish lighting and a flashy ring, accompanied by a humorous caption. When Leibovitz sued for copyright infringement, the U.S. Court of Appeals for the Second Circuit ruled in favor of Paramount, declaring the ad fair use. The court recognized the parody, noting it did not harm Leibovitz's market, and dismissed concerns about deterring celebrity participation in such shoots as not a legitimate harm. This case has become a notable example in copyright discussions, illustrating the permissible boundaries of parody under the fair use doctrine.
[Note, I only provide these snippets of original images from news articles for purely factual purposes for my commentary, comment, and teaching about copyright lawsuits to illustrate how an image could “transform.” I make no money off this. I do this for the love of the game.]
V. AI Companies Don't Like to Ask for Permission
If Homer Simpson has a drinking problem, GAI companies have a data problem -- they need more and more data and can’t help themselves. Why? In AI, data equals power. Notably, datasets can provide more power to AI companies than the algorithm, for it is the data from which the algorithms learn to make decisions, function, interpret, and respond to new inputs. The more diverse and extensive the data, the more intelligent and capable the AI becomes. AI companies need vast amounts of data from varied sources—books, articles, online reviews—to teach their AI systems how to understand and interact with human language effectively.
[AI-generated image: once you feed the monster you nurture, it won’t let you stop.]
It seems AI companies will go to extremes to get a hold of that data. The New York Times reported that AI labs have exhausted nearly every source of reputable English-language text available online to train their AI systems. High-quality data, particularly from professionally written and edited sources like books and articles, is deemed most valuable for training AI to produce human-like outputs. While big tech companies like Google and Microsoft may have a user base generating vast amounts of user data, privacy laws, and corporate policies restrict their use for AI training. By 2026, the demand for high-quality data is projected to exceed its availability. The New York Times highlighted the lengths AI companies are going to for data: (i) Meta considered buying Simon & Schuster and using copyrighted internet data, (ii) Google amended its service terms to utilize what its users have written in Google Docs and Google Maps (reviews), and (iii) OpenAI developed a tool to transcribe YouTube videos despite potential policy violations.
VI. Data Addiction Leads to Lawsuits for Copyright Infringement
The lengths AI companies take to ingest data have prompted content creators to sue GAI companies for using copyrighted works without permission to train AI systems. Many cases have cropped up against AI companies regarding their nonconsensual data scraping practices. These lawsuits raise a bunch of questions as to whether:
(i) training a model with copyrighted data requires a license,
(ii) the AI output infringes on the copyright of the materials used to train the models, and
(iii) it is a direct copyright infringement or can it be considered fair use?
Let’s look at Tremblay v. OpenAI, Inc./Microsoft as an exemplar of this trend. A group of authors, including Sarah Silverman, sued OpenAI and Microsoft for using their books to train AI language models without permission. The authors (plaintiffs) argue that AI companies should not be able to shield themselves by labeling AI output images as "new" when they are using the authors’ derivative works without permission. Conversely, AI companies (defendants) will likely assert a fair use argument – that they are transforming the origin work into something distinct with a different purpose. However, the “black box” nature of AI adds complexity to this argument. The lack of transparency, where even the developers may not precisely know how their AI systems make decisions, especially in deep learning models, will make it difficult to prove fair use.
The judge partially agreed with the defendants' request to dismiss the case, throwing out four of the six claims, but the authors can revise and resubmit. The following claims will move forward to trial: (i) direct copyright infringement claim (that OpenAI directly copied the authors' work and used it for commercial gain without permission), and (ii) unfair competition claim (that OpenAI unfairly competed with the authors by using their work without permission). It was evident the defendants used Plaintiff’s copyrighted works without permission for profit so that could be an infringement and an unfair practice. The court dismissed claims for (i) vicarious copyright infringement, (ii) negligence, (iii) unjust enrichment, and (iv) violations of the Digital Millennium Copyright Act.
[AI-generated image: If only Sarah Silverman and Sam Altman could resolve their differences via a thumb war.]
Vicarious copyright infringement is the idea that a company is responsible for the infringing activities carried out with its technology, even if it did not directly engage in the infringing activities itself. Here, the judge reasoned that the plaintiffs did not allege the “ChatGPT outputs contain[ed] direct copies of the copyrighted books,” and therefore, the authors needed to demonstrate “substantial similarity” between the source materials and ChatGPT's outputs. AI-generated content is not automatically a copyright infringement.
This decision follows a trend in similar cases, where courts are willing to consider claims of direct copyright infringement and unfair competition but are unsure about how to view AI outputs. So far, courts haven't viewed using an author’s work without permission to train language models as clearly an automatic copyright violation.
VII. Bottom Line
[Civil litigation can take years from filing the initial complaint to trying the case.]
Given that the courts have been flexible in allowing plaintiffs to resubmit their complaints and point out the weaknesses in the cases, it’s too early to know how this will all ultimately play out. For copyright holders to win a lawsuit against the AI companies, they need to show that the outputs from the models are substantially similar to the copyrighted content or contain parts of it. Is the use of copyrighted materials to train a model and then generate new content transformative enough to fall under fair use?
We shall see – but we're likely years away from a resolution. Meanwhile, AI companies, realizing their vulnerabilities to lawsuits, are increasingly entering into licensing agreements with content holders. This development signals an acknowledgment that obtaining consent prior to using someone’s work is necessary. In any case, litigation is expensive for companies so we may continue to see some CYA measures enacted by AI companies.
[AI-generated image: On the left we have the AI companies trying to break into the Elysian Fields of human creativity on the right. OpenAI looks to be trying a click-wrap type strategy where you have to opt-out of data scraping.]
Lawsuit Tracker:
Paul Tremblay, et al., v. OpenAI, Inc. (NDCA)
See above
Thomson Reuters Enterprise Centre GmbH v. ROSS Intelligence Inc. (DE)
Background: Thomson Reuters filed a lawsuit against Ross Intelligence, alleging that the AI company illegally used Westlaw’s short summaries of points of law (headnotes) for training purposes, infringing on Thomson Reuters’ copyright.
Status: In September 2023, the court ruled that the case could not be resolved on summary judgment due to disputed facts, and it is expected to go to trial in August 2024, with key issues including fair use and the transformative nature of Ross Intelligence’s AI training process.
The New York Times Co. v. Microsoft Corp (SDNY)
Background: The New York Times has initiated a copyright infringement lawsuit against OpenAI and Microsoft, arguing that Microsoft's chatbot, trained on its articles, hurts its business in two ways. Firstly, it diverts revenue by providing information that would have been accessed through Times' websites, like Wirecutter. Secondly, chatbots have emerged as a competing source of trustworthy information, threatening traditional news outlets like The Times and potentially damaging its reputation and revenue.
Status: In March 2024, OpenAI and Microsoft each filed motions to dismiss parts of the lawsuit, arguing that training their large language models (LLMs) didn’t disrupt the market for original news content.
Andersen et al. v. Stability AI Ltd. et al (NDCA)
Background: The plaintiffs, a group of artists and content creators, allege that Stable Diffusion, Midjourney, and DreamUp were trained using copyrighted images without permission, resulting in copyright infringement. The claims include direct copyright infringement, creation of unauthorized derivative works, vicarious copyright infringement (based on user-generated content), violation of copyright management laws, unfair competition, violation of the right of publicity (for generating work "in the style of" specific artists) and breach of contract (against DeviantArt).
Status: The defendants filed motions to dismiss, which the court granted in part in October 2023. Only one plaintiff's direct copyright infringement claim against Stability AI survived. The court allowed the plaintiffs to amend their claims, which they have done in an updated complaint.
Authors Guild v. OpenAI Inc. (SDNY)
Background: The Authors Guild (including Jonathan Franzen, John Grisham, George R.R. Martin, and Jodi Picoult) is suing OpenAI for copyright infringement, claiming OpenAI used their books to train AI models without permission.
Status: Still pending.
J.L. et al. v. Google LLC (NDCA)
Background: Filed by eight individuals who seek to represent millions of internet users and copyright holders, the lawsuit charges that Google's practice of scraping data from websites violates privacy and property rights. The plaintiffs contend that Google lacks the right to use their creative works, personal expressions, or any content shared online, simply because it is available on the internet.
Status: In February 2024, Google filed a motion to dismiss a lawsuit, arguing that its generative AI models only use publicly available or lawfully obtained information for training. The company claims that the plaintiffs' privacy and security concerns are unfounded, as they have no property or privacy rights over information they shared publicly on the internet. Google asserts that, outside of copyright law and fair use provisions, there is no legal basis for controlling the use of publicly available information.
Basbanes et al. v. Microsoft Corporation et al. (SDNY)
Background: Journalists Nicholas Gage and Nicholas Basbanes sued OpenAI and Microsoft, accusing them of using copyrighted books and articles to train its AI model, ChatGPT. The complaint further points out that ChatGPT once had the capability to produce exact quotes from copyrighted texts but now offers summaries instead. According to the lawsuit, these summaries are considered derivative works that rely on the original materials that were copied.
Status: Pending (filed January 2024).
Kadrey v. Meta Platforms (now consolidated with Chabon v. Meta Platforms) (NDCA)
Background: This case challenged Meta’s use of copyrighted books to train its LLaMA language models.
Status: The court dismissed most claims, including those asserting that all outputs of the model are infringing upon derivative works. The only remaining claim is for direct copyright infringement based on copies made during training.
Makkai et al. v. Databricks, Inc. et al. (NDCA)
Background: Four prominent authors—Andre Dubus III, Susan Orlean, Rebecca Makkai, and Jason Reynolds—allege that Nvidia's NeMo Megatron models and Databricks' MosaicML models were trained on copyrighted works without permission. Nvidia has responded by asserting that their model creation complies with copyright laws, claiming fair use given the broad input sources and minimal influence of each work on the final model output.
Status: Pending (just filed May 2024).
Resources:
Image of Artist Robot – generated on Firefly and Dall-E
Image of Rejected Robot – generated on Pixlr and DreamStudio
Image of Sam Altman with Data Plant – generated on Firefly and Pixlr
Image of Thumb War – generated by DreamStudio and Firefly
Image of Litigation Timeline - created with Visme
Image of AI Companies Trying to Break Into the Elysian Fields of Human Expression – generated on Dall-E and Firefly
What is machine learning (ML)?, IBM, https://www.ibm.com/topics/machine-learning (last visited Apr. 20, 2024).
Randy Kennedy, “Artist Sues the A.P. Over Obama Image,” The New York Times, February 9, 2009, https://www.nytimes.com/2009/02/10/arts/design/10fair.html.
Cade Metz, Cecilia Kang, Sheera Frenkel, Stuart A. Thompson and Nico Grant, "How Tech Giants Cut Corners to Harvest Data for A.I.," The New York Times, April 6, 2024, https://www.nytimes.com/2024/04/06/technology/tech-giants-harvest-data-artificial-intelligence.html.
Cade Metz and Karen Weise, "Microsoft Seeks to Dismiss Parts of Suit Filed by The New York Times," The New York Times, March 4, 2024, https://www.nytimes.com/2024/03/04/technology/microsoft-ai-copyright-lawsuit.html.
Michael M. Grynbaum and Ryan Mac, "The Times Sues Open AI and Microsoft Over A.I. Use of Copyrighted Work," The New York Times, December 27, 2023, https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html.
Emilia David, "Sarah Silverman’s lawsuit against OpenAI partially dismissed" The Verge, February 13, 2024, https://www.theverge.com/2024/2/13/24072131/sarah-silverman-paul-tremblay-openai-chatgpt-copyright-lawsuit.
Baker & Hostetler LLP, "Case Tracker: Artificial Intelligence Copyrights and Class Actions," BakerLaw, accessed May 10, 2024, https://www.bakerlaw.com/services/artificial-intelligence-ai/case-tracker-artificial-intelligence-copyrights-and-class-actions/.
"A New Generation of Legal Issues Part 2: First Lawsuits Arrive Addressing Generative AI," Perkins Coie LLP, April 20, 2023, https://www.perkinscoie.com/en/news-insights/first-lawsuits-arrive-addressing-generative-ai.html.
“Recent Rulings in AI Copyright Lawsuits Shed Some Light, but Leave Many Questions," Perkins Coie LLP, December 14, 2023, https://www.perkinscoie.com/en/news-insights/recent-rulings-in-ai-copyright-lawsuits-shed-some-light-but-leave-many-questions.html
Suzanne V. Wilson, "Artificial Intelligence and Copyright" United States Copyright Office, August 24, 2023, https://www.copyright.gov/ai/docs/Federal-Register-Document-Artificial-Intelligence-and-Copyright-NOI.pdf.
Emilia David, "George R.R. Martin and other authors sue OpenAI for copyright infringement," The Verge, September 20, 2023, https://www.theverge.com/2023/9/20/23882140/george-r-r-martin-lawsuit-openai-copyright-infringement.
Blake Brittain, "Google Hit With Class Action Lawsuit Over AI Data Scraping," Reuters, July 11, 2023, https://www.reuters.com/legal/litigation/google-hit-with-class-action-lawsuit-over-ai-data-scraping-2023-07-11/.
Hailey Konnath, "Google Says AI Data Scraping Suit Still Doesn't Hold Up," Law360, February 12, 2024, https://www.law360.com/articles/1796997/google-says-ai-data-scraping-suit-still-doesn-t-hold-up.
Stacey Chuvaieva, "Federal Judge Dismissive of AI Complaint: Anderson v. Stability AI," Mitchell Silberberg & Knupp LLP, October 31, 2023, https://www.msk.com/newsroom-alerts-Federal-Judge-Dismissive-of-AI-Complaint.
David Rabinwitz, "Using Copyrighted Works in AI Training Data May Infringe Even If the AI Output Doesn’t," Moses & Singer LLP, January 16, 2024, https://www.mosessinger.com/publications/using-copyrighted-works-in-ai-training-data-may-infringe-even-if-the-ai-output-doesnt.
Isaiah Poritz, "OpenAI Hit With Another Copyright Suit From Pair of Journalists," Bloomberg Law, January 5, 2024, https://news.bloomberglaw.com/ip-law/openai-hit-with-another-copyright-suit-from-pair-of-journalists.
Kyle Jahner, "Nvidia, Databricks Sued in Latest AI Copyright Class Actions," Bloomberg Law, May 3, 2024, https://news.bloomberglaw.com/ip-law/nvidia-databricks-sued-in-latest-ai-copyright-class-actions.
Nancy Rubin and Steward McKelvey, "Canada: The AGNS, Annie Leibovitz and the Naked Gun Lesson in Permissible Parody," Mondaq, September 4, 2013, https://www.mondaq.com/canada/advertising-marketing--branding/253026/the-agns-annie-leibovitz-and-the-naked-gun-lesson-in-permissible-parody.
"Use of Excerpts from and Linking to Article Found to Be Fair Use," Winston & Strawn LLP,, October 26, 2010, https://www.winston.com/en/insights-news/use-of-excerpts-from-and-linking-to-article-found-to-be-fair-use.
"Exceptions for Libraries & Archives," Copyright Alliance. https://copyrightalliance.org/education/copyright-law-explained/limitations-on-a-copyright-owners-rights/copyright-exceptions-libraries-archives/
"Exceptions for Performances and Displays," Copyright Alliance. https://copyrightalliance.org/education/copyright-law-explained/limitations-on-a-copyright-owners-rights/copyright-exceptions-performances-displays/
"First Sale Exceptions Copyright," Copyright Alliance. https://copyrightalliance.org/education/copyright-law-explained/limitations-on-a-copyright-owners-rights/first-sale-exceptions-copyright/
" Exceptions for the Blind," Copyright Alliance. https://copyrightalliance.org/education/copyright-law-explained/limitations-on-a-copyright-owners-rights/copyright-exceptions-for-the-blind/
" Exceptions for Educational Institutions," Copyright Alliance. https://copyrightalliance.org/education/copyright-law-explained/limitations-on-a-copyright-owners-rights/copyright-exceptions-educational-institutions/
Disclaimer: This post is for general information purposes only. It does not constitute legal advice. This post reflects the current opinions of the author(s). The opinions reflected herein are subject to change without being updated.
This post originally appeared as a guest post on May 16, 2024 on @aisupremacy:
Thanks to Michael Spencer (@aisupremacy) for reaching out and for the opportunity. Check out his Substack for the latest and greatest on AI.