When OpenAI unleashed ChatGPT on the world in November 2022, it sparked more than just a tech revolution. Researchers now argue that this moment created a form of digital contamination that could threaten the future of artificial intelligence development itself.
The comparison might sound dramatic, but academics are drawing parallels to nuclear weapons testing. Just as atomic bomb tests contaminated metals manufactured after 1945 with radioactive particles, ChatGPT debut marked the beginning of what experts call “AI data pollution.”
The Problem with Synthetic Data of ChatGPT
Here’s the issue: AI models are increasingly being trained on data created by other AI models. This creates a feedback loop that researchers worry could lead to “AI model collapse,” where each generation of AI becomes less reliable than the last.
Maurice Chiodo, a research associate at the University of Cambridge’s Centre for the Study of Existential Risk, explains it: “Everyone participating in generative AI is polluting the data supply for everyone.”

The concern isn’t just theoretical. Since ChatGPT was launched, millions of AI-generated articles, images, and conversations have flooded the internet. Future AI systems trained on this mixed data might struggle to distinguish between authentic human content and synthetic material.
The Low-Background Steel Connection
The nuclear comparison isn’t just metaphorical – it’s historically grounded. After atomic testing began, scientists needed “low-background steel” for sensitive medical equipment because regular steel was contaminated with radioactive particles. Ironically, one major source was the German naval fleet scuttled in 1919, decades before nuclear testing.
John Graham-Cumming, former CTO of Cloudflare, saw the parallel early on. He registered the domain lowbackgroundsteel.ai in March 2023 and began cataloging data sources from before the AI explosion, like GitHub’s Arctic Code Vault from 2020.
“I liked the idea of a repository of known human-created stuff,” Graham-Cumming told The Register.
Real Crisis or Overblown Concern?
Not everyone agrees this is a crisis. Some AI practitioners argue that model collapse can be prevented through careful data curation and training techniques. The debate intensified recently when Apple researchers published findings about reasoning models, only to have their conclusions challenged by other experts.
But Chiodo and his research team believe the stakes are higher than just model performance. They worry that access to “clean” pre-2022 data will create unfair competitive advantages for established AI companies, potentially locking out newer competitors.
“You can build a very usable model that lies,” Chiodo notes. “You can build quite a useless model that tells the truth.”
The Style Matters More Than Truth
Rupprecht Podszun, a law professor at Heinrich Heine University, emphasizes that it’s not just about accuracy. Pre-2022 human communication data captures authentic writing styles and creative thinking patterns that are valuable for training AI systems.
“Email data or human communication data – which pre-2022 is really data which was typed in by human beings – that’s much more useful than getting what a chatbot communicated after 2022,” Podszun explains.
Cleaning Up the Digital Environment
So what can be done? The researchers suggest several approaches, though none are simple:
Mandatory labeling of AI-generated content could help, but watermarks are easily removed. Different countries having different rules complicate global enforcement. Federated learning – where data owners allow training without sharing the actual data – might preserve access while maintaining privacy.
Some propose government-maintained repositories of clean data, though this raises concerns about political control and security risks.
Time Is Running Out
The urgency comes from what researchers call the “irreversibility” of the problem. Once datasets become thoroughly contaminated with AI-generated content, cleaning them becomes “prohibitively expensive, probably impossible,” according to Chiodo.
While regulators in the US and UK favor light-touch approaches to avoid stifling innovation, Europe’s AI Act suggests a more proactive stance. The lesson from social media’s dominance, Podszun argues, is not to wait until it’s too late.
The question facing policymakers and tech companies is whether they’ll act before digital pollution becomes as permanent as radioactive contamination – or if we’re already past the point of no return.