AI training under fire as Britannica sues OpenAI

Craig Nash
By
Craig Nash
Tech writer at All Things Geek. Covers artificial intelligence, semiconductors, and computing hardware.
6 Min Read
AI training under fire as Britannica sues OpenAI

AI training copyright lawsuits just entered a new phase. Encyclopedia Britannica and Merriam-Webster filed suit against OpenAI on March 13, 2026, in Manhattan federal court, claiming ChatGPT was trained on nearly 100,000 of their articles without permission or license. The complaint alleges that OpenAI copied encyclopedia entries, dictionary definitions, and reference content to build large language models like GPT-4 and subsequent versions.

TL;DR: Britannica alleges OpenAI trained ChatGPT on nearly 100,000 unlicensed articles, producing verbatim reproductions and paraphrases that divert traffic from Britannica’s sites. The lawsuit joins similar cases by The New York Times, Ziff Davis, and multiple newspapers, marking a turning point in AI training disputes.

What Britannica claims OpenAI did

The core allegation is straightforward: OpenAI scraped Britannica’s content at scale without permission. ChatGPT now produces responses containing full or partial verbatim reproductions of Britannica articles, paraphrases, summaries, or mimics the selection and curation of content found in the original works. The complaint argues this substitutes for Britannica’s own content, directly competing with it and diverting users who would otherwise visit Britannica’s websites.

OpenAI allegedly uses Britannica content in retrieval augmented generation (RAG) workflows, scanning the web and databases for updated information to feed into ChatGPT responses. Beyond copyright infringement, Britannica accuses OpenAI of trademark violations under the Lanham Act, claiming the company falsely attributes hallucinations and made-up content to Britannica, implying permission to reproduce material that was never licensed. The lawsuit seeks unspecified monetary damages, a court order blocking further infringement, and compensation for the harm and illicit profits OpenAI has reaped.

Why AI training copyright lawsuits are accelerating

Britannica is not alone. The New York Times, Ziff Davis (which owns Mashable, CNET, IGN, and PC Mag), a dozen U.S. and Canadian newspapers including the Chicago Tribune and Denver Post, and the Canadian Broadcasting Corporation have all sued OpenAI over similar allegations. Britannica itself is already fighting Perplexity AI in a parallel case with comparable claims.

The pattern is clear: content creators argue that AI companies have built trillion-dollar products on the back of unlicensed work. Publishers claim ChatGPT starves them of revenue by generating responses that substitute and directly compete with their content. Beyond economics, Britannica’s complaint raises a broader concern—that ChatGPT’s use of scraped content jeopardizes public access to high-quality and trustworthy information. When AI systems cannibalize publisher traffic, fewer resources flow to editorial teams that produce original research and fact-checking.

OpenAI has not responded to requests for comment on the Britannica suit. The company’s standard defense across all these cases is fair use—the argument that transforming content into something new through machine learning constitutes legal transformation. Courts have not yet ruled on whether training large language models on copyrighted text qualifies as fair use, making this litigation a critical test of how copyright law applies to AI.

What comes next in AI training disputes

Britannica’s case demands a jury trial and seeks to hold OpenAI responsible for the substantial harm and illicit profits from copyright and trademark violations. The lawsuit will likely take years to resolve, but its filing signals that content creators are no longer waiting for regulatory clarity—they are fighting in court.

The outcome will reshape how AI companies source training data. If courts rule against OpenAI, licensing content from publishers could become mandatory, fundamentally altering the economics of large language model development. If courts side with OpenAI’s fair use argument, publishers will have lost their primary legal lever and may turn to regulatory or legislative solutions.

Did OpenAI actually use Britannica content?

The lawsuit claims OpenAI copied nearly 100,000 Britannica articles, but the exact mechanisms and extent of copying remain within OpenAI’s knowledge. Britannica alleges that ChatGPT’s outputs demonstrate verbatim and paraphrased reproductions, but independent verification of the precise scale and training methodology has not been published.

What is the difference between this lawsuit and others against OpenAI?

Britannica’s case targets both copyright and trademark infringement, whereas most other suits focus primarily on copyright. The trademark claim—that OpenAI falsely attributes Britannica’s name to generated content—is a distinct legal angle that other publishers have also pursued.

Could OpenAI lose this case?

OpenAI faces multiple lawsuits with similar allegations, but no court has yet ruled on whether training AI models on copyrighted text without license violates copyright law or qualifies as fair use. The outcome depends on how judges interpret fair use doctrine in the context of machine learning, a question with no clear precedent.

Britannica’s lawsuit marks a threshold moment. When publishers stop hoping for negotiated licenses and start demanding court orders, the AI industry’s training practices face genuine legal jeopardy. Whether OpenAI wins or loses, the days of scraping content without permission are numbered—either by court order or by necessity to avoid future litigation.

Edited by the All Things Geek team.

Source: Tom's Guide

Share This Article
Tech writer at All Things Geek. Covers artificial intelligence, semiconductors, and computing hardware.