Authors sue Microsoft over pirated AI training
SANTA CLARA,CA/USA – FEBRUARY 1, 2014: Microsoft corporate building in Santa Clara, California. Microsoft is a multinational corporation that develops, supports and sells computer software and services.

After Meta and Anthropic, it is now Microsoft’s turn to face an AI centric lawsuit – this time, the tech titan is facing the music from a group of prominent authors who allege the company used unauthorized, pirated digital copies of their books to train its Megatron AI model. The complaint was filed in New York federal court on Tuesday, June 25.

The plaintiffs, including Pulitzer Prize winner Kai Bird, New Yorker writer Jia Tolentino, and author Daniel Okrent, contend that Microsoft exploited a collection of nearly 200,000 pirated books to teach its AI system to generate text responses. They argue that the Megatron algorithm, designed to produce human-like textual output from user prompts, was built using content that mimics the syntax, voice, and themes of their copyrighted works without permission or compensation.

The core of the authors’ complaint centers on copyright infringement. It is alleged that Microsoft’s use of this “shadow dataset” of pirated literature was a deliberate move to bypass licensing fees and agreements with creators and publishers. The lawsuit claims that the AI model’s ability to replicate the writing styles and narrative patterns of these works marks a clear violation of intellectual property rights. The authors claim their creative output was unfairly exploited to build a commercial product that, in some ways, competes with their own output.

The lawsuit seeks statutory damages of up to $150,000 for each work that Microsoft is alleged to have misused. It also requests a court order to prevent Microsoft from continuing its unauthorized use of the copyrighted materials in the future. The sheer volume of works cited in the complaint suggests that potential damages could reach a major sum if the authors prevail. Microsoft spokespersons have not yet commented publicly on the lawsuit.

This legal action against Microsoft follows closely on the heels of other high-profile lawsuits filed by authors, news organizations, and various copyright holders against leading tech companies, including Meta Platforms, Anthropic, and Microsoft-backed OpenAI. These cases collectively aim to define the boundaries of “fair use” under copyright law concerning generative AI training.

Just one day prior to the Microsoft lawsuit, a California federal judge issued a ruling in a separate case involving AI firm Anthropic. The judge determined that while Anthropic’s use of authors’ material for training its AI systems might qualify as fair use under US copyright law in some aspects, the company could still be held liable for using pirated versions of those books. This ruling marked the first US judicial decision directly addressing the legality of using copyrighted materials without permission for generative AI training. Similarly, an earlier ruling in a case involving Meta’s AI models also touched upon fair use, though the judge noted that the plaintiffs had not sufficiently proven market dilution. These preceding decisions contribute to an evolving legal framework that courts are attempting to establish for AI development and intellectual property rights.