OpenAI has suffered a significant legal blow in its ongoing copyright dispute with authors and publishers, as a federal judge ruled the company must turn over internal communications about deleting two massive collections of pirated books. The decision could expose the AI giant to substantially higher damages and undermine its defense strategy.
The controversy centers on datasets called “books 1” and “books 2,” which OpenAI erased in 2022. Authors and publishers suing the company have already obtained Slack messages between OpenAI employees discussing the deletion, but the real fight has been over whether additional communications are protected by attorney-client privilege.
Judge Ona Wang of the U.S. District Court has now decided they’re not, ordering OpenAI to hand over documents that could reveal the company’s true motivations.
The stakes couldn’t be higher. These internal communications might prove “willful” copyright infringement, which carries penalties of up to $150,000 per work potentially adding up to billions of dollars in damages.
Even worse for OpenAI, if the court determines the company destroyed evidence while anticipating litigation, judges in future trials could instruct juries to assume that evidence would have been damaging to OpenAI’s case.
Judge Rules OpenAI Waived Privilege Over Dataset Deletion in Copyright Battle
OpenAI has been changing its story throughout the discovery process, creating what Judge Wang called a “moving target” of privilege assertions. Initially, the company claimed attorney-client privilege over all communications about the deletion.
Then it offered to turn over some information. Later, OpenAI’s lawyer stated the datasets weren’t being used for training and were deleted simply due to “non-use.” But when authors’ lawyers challenged this explanation, OpenAI tried to withdraw that statement and claimed all evidence about the erasure was privileged after all.

Judge Wang wasn’t buying it. She ruled that by disclosing a reason for deleting the datasets even claiming they weren’t being used OpenAI had already waived privilege protection. The company can’t offer an explanation and then later declare that same explanation off-limits when facing tough questions, the judge determined.
The ruling means plaintiffs will get access to Slack messages from channels with telling names like “project-clear” and “excise-libgen,” where employees discussed removing the datasets. OpenAI’s in-house legal team will also face depositions about their involvement.
This development strengthens what’s becoming an increasingly successful legal theory against AI companies: that downloading pirated content from shadow libraries itself constitutes copyright infringement, regardless of whether that content was actually used to train AI models.
This argument has evolved since these lawsuits began, when lawyers initially tried to connect the piracy directly to model training as a single violation.
The approach has already scored a victory. Writer Andrea Bartz sued Anthropic, Claude’s creator, over similar allegations of illegally downloading millions of books.
After Anthropic’s $1.5B Settlement, OpenAI Faces Willful Infringement Challenge
While most of Anthropic’s case went well, Judge William Alsup allowed the downloading theory to proceed to trial, writing that “Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft.” Anthropic ultimately settled that lawsuit for $1.5 billion.
OpenAI now faces a challenging path forward. To avoid findings of willful infringement, the company needs to demonstrate it had a good faith belief its actions were legal.
But Judge Wang noted a “fundamental conflict” when defendants try to block discovery into their state of mind by claiming attorney-client privilege essentially refusing to provide the very evidence that might prove their innocence.
OpenAI filed an appeal on Wednesday and moved to pause enforcement of the discovery requirements while that appeal proceeds. The company continues maintaining it didn’t willfully infringe any copyrights.
The broader implications extend beyond this single case. How courts handle the question of downloading pirated materials could shape the legal landscape for the entire AI industry, affecting how companies source training data and potentially adding billions in liability for past practices. For OpenAI, these internal communications might prove to be the smoking gun that authors and publishers have been seeking all along.




