Microsoft is facing legal action from a group of well-known writers who argue that the tech giant employed their books without permission to train its artificial intelligence systems.
The suit, which was brought on Tuesday in federal court in New York, is the latest skirmish in a long-running conflict between content producers and technology companies regarding the development of AI systems.
The authors suing come with some well-known names: Pulitzer-winning biographer Kai Bird; New Yorker writer and award-winning essayist Jia Tolentino; and Daniel Okrent, a past public editor for The New York Times. These writers aren’t merely upset that their work is being utilized; they’re claiming Microsoft used pirated digital copies of their books specifically to train its Megatron AI model to answer human questions and prompts.
Microsoft and the Copyright Conundrum, Training AI with Books
Microsoft used an unprecedented library of almost 200,000 copied books to train Megatron, a text-generating computer program, the authors claim. The authors contend the method enabled Microsoft to construct “a computer model not only consisting of the work of a thousand authors and writers, but constructed to produce a broad range of expression that approximates the syntax, voice, and themes of the copyrighted work upon which it was trained.”
This case is well-timed. Just yesterday, a judge in a California federal court delivered a landmark ruling in a comparable case against AI firm Anthropic. That judge ruled that Anthropic’s use of copyrighted works to train its AI models constituted “the fair use” of copyrighted works in copyright law, even though the firm could still be sued for pirating the books in the first place.

This was the first significant U.S. court ruling on if the use of copyrighted works without permission to train AI is legal.
The timing is not coincidental. Microsoft is not the sole technology giant to be confronted with such lawsuits.
Writers, authors, and other creators have been filing cases against some of the industry leaders of the AI space, including Meta Platforms, Anthropic, and OpenAI, which are investments heavily made by Microsoft.
The common thread running through all the cases is the same fundamental question: Are technology giants permitted to utilize copyrighted material to train their AI models without asking permission or paying compensation?
Microsoft Sued: Authors Challenge AI’s “Fair Use” of Copyrighted Works
Technology firms have long justified their actions by arguing that they’re engaged in “fair use” of copyrighted content. They argue that they’re not copying what already exists, but producing something new and transformative. They also threaten that forcing them to pay copyright owners for training data would severely hurt the new AI industry.
But the authors do not view it that way. To them, it is a cut-and-dried case of theft of their intellectual work going into developing profitable AI systems without their compensation or consent. The sums involved are also significant.
In their lawsuit against Microsoft, the authors are requesting a court to enjoin Microsoft from infringing on their copyrights. They’re requesting statutory damages potentially up to $150,000 per work Microsoft is alleged to have misused.
Microsoft has yet to publicly respond to the lawsuit. The company spokespeople did not immediately provide a statement when contacted for comment. The attorneys representing the authors likewise did not respond to comment on the case.
This court case is symptomatic of a far larger issue confronting the AI field. As their technology grows stronger and more valuable, the questions regarding how they’re trained are coming to a head. The decisions in cases such as this one are likely to make big precedents in how AI firms can acquire and utilize training data going forward.
For their part, the authors are hoping that the courts will rule in their favor and decide that employing pirated books to train AI systems is a crossing of a legal boundary. Microsoft, on the other hand, will probably contend that its employment of the material is within the provisions of fair use and that preventing such action will damage innovation in AI technology.