Weekly Column: Shadow Libraries, Licensing Gold Rush

In this week’s column, California Sports Lawyer® CEO, Founder, and Managing Attorney Jeremy M. Evans writes about the future of business models and public use for generative artificial intelligence (AI) platforms like ChatGPT if forced to pay licensing fees for data and information.  

Developers and legislators need to be careful however to allow for public access and the natural growth and progression of technology and human life.    

You can read the full column below.  (Past columns can be found, here).

~

OpenAI has lost a major battle in court that calls to account the inputs used to create the large language model (LLM) that collected its knowledge for artificial intelligence (AI) from the internet of information and books. Authors of books and other copyright owners may now have a clearer pathway to file suit against the company and its generative AI platform, ChatGPT. In some way, the result of paying to license copyrighted works was inevitable because the law requires it and people who own those works want to assert their ownership and payments for licensing.

The essential issue are deleted books that OpenAI allegedly used to train its LLM in the ChatGPT platform. The common term for these books is shadow libraries. The problem is that chain of title is normally required when licensing, but it is also uncertain as to what was included in the library. What was likely used as a shortcut to information is now creating a potential for exposure to the company.

Having clear and clean data has become a foundation piece to successful and legal AI business. The old adage of “show your work” is proving itself useful at least potentially for copyright owners. This is the result of copyright and privacy laws in California and elsewhere. In other words, it has always mattered where information comes from for it to be used and to be useful long term.

If data becomes unusable until it licensed properly, the result is a licensing gold mine and gold rush for copyright owners. Data that is copyrighted would be required to be licensed. This means copyright owners would be able to monetize their works on AI platforms similar to licensing a movie, music, or book, etc. to a studio or streamer. The chain of title on how an AI model is trained would create the opportunity to license.

Entertainment, media, and sports talent and content would become immediately more valuable because there is an existing library that includes substantial archives that is distinct and popular. Content that includes broadcast, but also name, image, and likeness of talent. OpenAI and other AI platforms would immediately become smarter because it could not include previously unavailable information or data as it was obviously copyrighted and therefore unusable without a license. For example, notice how generative AI platforms sometimes have difficultly creating images, graphic, and videos or pulling information from websites because of copyright and privacy protections, even for basic tasks.

The issue would become can AI businesses afford the licenses. How much are the licenses? How long does the license last? Would people be able to afford paying to access the platform with the platform passing licensing the costs to them? Could the platform focus on an advertising model like social media where it learns from its users and sells data to offset the licensing fees?

Solutions to pull away from the shadow-library era toward a more sophisticated AI model and world is one where technology and/or legislation guides licensing and fair use. One idea is a nationwide LLM-training model exemption that is conditioned with opt-outs, transparency, no-substitution, and preservation. Another idea is rights societies, licensing businesses, and dataset marketplaces that help set fees while encouraging dealmaking and lowering litigious friction. Blockchain technology could also privatize dealmaking for licensing where chain of title would be guaranteed, as would payment. Maybe all of the aforementioned solutions work together.

Unlicensed and uncleared data is daring and possibly unsustainable without a parachute. However, access to knowledge and learning models are important to humans and growth. In the future, shadow libraries will be replaced by licensing as a business model, which means higher prices for access to AI platforms. Clean and cleared inputs will decide who gets to build the future, and who pays for it. Developers and legislators need to be careful however to allow for public access and the natural growth and progression of technology and human life.

~      

About Jeremy M. Evans:

Jeremy M. Evans is the Chief Entrepreneur Officer, Founder & Managing Attorney at California Sports Lawyer®, representing entertainment, media, and sports clients in contractual, intellectual property, and dealmaking matters. Evans is an award-winning attorney and industry leader based in Los Angeles and Newport Beach, California. He can be reached at Jeremy@CSLlegal.com. www.CSLlegal.com.  

Copyright © 2025.  California Sports Lawyer®.  All Rights Reserved.