Interest in artificial intelligence (AI) is at an all-time high, with Google searches hitting 92% of their peak. Yet, new research reveals that AI’s own success might threaten its future. Researchers from Cambridge and Oxford Universities have discovered a troubling phenomenon: AI-generated content could lead to what’s known as “model collapse,” where the quality of online information deteriorates significantly.
What Is Model Collapse?
Model collapse occurs when AI systems, which generate content based on extensive datasets, start producing increasingly faulty and incoherent results after repeated cycles of self-reliance. Essentially, AI tools that train on their own outputs begin to lose accuracy and produce less meaningful information over time.
The research team conducted experiments to explore this issue. They found that after just a few cycles of AI-generated content being used as input for further AI queries, the quality of responses began to decline. By the fifth cycle, the information was noticeably less coherent, and by the ninth, it became entirely nonsensical. “It’s surprising how quickly model collapse can occur and how elusive it is,” says Shumailov, a leading researcher. He notes that initial signs may seem minor, affecting only less represented data, but the consequences become severe as the process progresses, resulting in diminished diversity and increasing errors.
The Prevalence of AI-Generated Content
This issue is compounded by the vast amount of AI-generated content online. According to a study by Amazon Web Services (AWS), about 57% of internet text is either AI-generated or has been processed through an AI algorithm. This high proportion means that AI systems are increasingly trained on their own outputs rather than fresh, human-generated data.
As AI relies more on its own content, it risks “feeding on itself” and losing touch with the original, accurate information. This self-referential cycle could undermine the effectiveness of AI systems, making them less reliable and accurate over time.
Real-World Implications
The impact of model collapse can be seen in various examples. In one case, an AI model trained on diverse images initially produced accurate pictures of animals. However, as it was repeatedly trained on its own outputs, it began to focus only on well-known species like golden retrievers, neglecting lesser-known breeds. This shift not only reduced the model’s accuracy but also diminished its ability to represent the full diversity of animals.
Similarly, in text-based models, an AI-generated article about 14th-century church steeples deteriorated into a nonsensical piece about jack-tailed rabbits after several iterations. These examples highlight how quickly and dramatically AI systems can lose their grip on reality when they rely too heavily on their own synthetic content.
Addressing the Challenge
To combat model collapse, researchers suggest preserving a repository of human-generated content. Ensuring that AI systems have access to original, human-created information could help maintain their accuracy and relevance. Shumailov and his team also emphasize the need for coordinated efforts among developers and researchers to address these challenges. Sharing information about training data and working together to manage model collapse could help preserve the integrity of AI systems.
However, achieving this coordination is challenging given the rapid growth of AI and the sheer volume of AI-generated content. Shumailov points out that filtering out AI-generated material at scale is increasingly difficult, and no clear solution has emerged yet.