Shocking testimony in an ongoing antitrust case has revealed that Google continues to train its search AI on publisher content even when those publishers have explicitly opted out of AI training.
The revelation came during courtroom proceedings in Washington, where a Google executive admitted to practices that many publishers had been unaware of until now.
During questioning by Department of Justice attorney Diana Aguilar, Eli Collins, Vice President at Google DeepMind, confirmed what many publishers had feared: opting out of AI training doesn’t fully protect their content.
Publishers’ AI Opt-Outs Circumvented by Google Search Practices
Collins testified that while publishers can prevent DeepMind (the division behind Google’s Gemini AI models) from using their content, this protection doesn’t extend to other Google departments.
Specifically, once Gemini is integrated into Google Search, the search teams can continue training their AI on the same publisher content that DeepMind was blocked from using as long as they’re developing “search-specific” AI features like “AI Overviews.” “They treat it all as one big search product,” explained Matt Rogerson of The Guardian, capturing the frustration felt across the publishing industry.
The testimony revealed a startling statistic: publisher opt-outs had cut DeepMind’s training dataset in half, from 160 billion to 80 billion tokens. However, those same publishers who thought they were protecting their content through technical measures like robots.txt files and specific user-agent blocks (such as Google-Extended) now learn that their barriers were less effective than believed.

This practice has serious implications for online publishers, especially as AI-generated search answers become more common. Features like AI Overviews provide direct answers to users’ questions without requiring clicks through to original websites, potentially devastating publisher traffic and revenue.
Many content creators now feel misled about the effectiveness of opt-out mechanisms they’ve implemented. What they thought was a comprehensive shield against their content being used to train AI systems turns out to have significant gaps.
The Department of Justice has incorporated these revelations into its broader antitrust case against Google. Attorneys presented internal documents showing the scale of data removed from DeepMind’s training pool while emphasizing that this same data remains available to other Google teams.
Google’s AI Under Antitrust Scrutiny
The DOJ is seeking aggressive remedies that would recast Google’s business model, such as compelling the tech behemoth to divest its Chrome browser and prohibiting it from paying for default search placement on devices. These moves would also affect Google’s AI products, which are based on the company’s enormous data pool.
Prosecutors say that Google’s data-practice patterns to support its train AI models lock in its market power. By keeping publisher content available even after opt-outs, Google can keep building its AI models with quality data that is unavailable to competitors.
This case exemplifies the ever-more entangled relationship among tech firms, creators of content, and data driving contemporary AI models. With greater integration of artificial intelligence in search and other Internet services, the issues regarding rights to data and equitable pay to creators are increasingly pressing.
Google argues that its actions are within its policies and are required in order to provide good quality search results. Publishers and regulators, however, increasingly wonder if the present system reasonably balances everyone’s interests.
The decision in this antitrust lawsuit can set significant precedents regarding the way that technology companies have to deal with content permissions in the age of artificial intelligence. It can also lead to new regulations that provide publishers with greater substantive control over their work being utilized.
Publishers Fight for IP Control in the AI Era
For the moment, publishers are left questioning the real worth of existing opt-out provisions and calling for more transparency and control over their intellectual property. As one industry watcher commented, “This isn’t just about search anymore it’s about who controls the future of information online.”
As responses created by artificial intelligence gain traction in search results, the incentives also mount for publishers whose livelihoods are predicated on site traffic. With less defined parameters and fewer robust opt-outs, many worry the dynamic between content creators and tech platforms will become more and more one-way.
The case rumbles on in Washington, with both the world of publishing and tech community observing closely for its potential to remake the digital world.