As the race to advance generative AI technology gains momentum, one critical question looms large for the technology industry to address: how can we discern between AI-generated text and human-written text? With the emergence of powerful AI language models like ChatGPT, GPT-4, Google Bard, and others, the ability to create convincing and valuable written content has become both a boon and a challenge. On the one hand, it accelerates and simplifies the process of writing software code, leading to increased efficiency. However, it also raises concerns about the proliferation of factual inaccuracies and misinformation, as AI-generated content may inadvertently propagate errors and lies.
Recognizing the significance of this issue, OpenAI, the creator of ChatGPT and GPT-4, took a proactive step by introducing a solution in January: a “classifier to distinguish between text written by a human and text written by AIs from a variety of providers.”
This classifier serves as a foundational tool to help identify and differentiate between content created by human authors and that which originates from AI language models. By implementing such measures, the technology industry aims to mitigate potential challenges posed by AI-generated text and ensure that readers can make informed judgments about the authenticity and reliability of the information they encounter.
The Ongoing Battle for AI-generated Content Detection and Provenance Techniques
As AI advances and permeates various aspects of our lives, fostering transparency and accountability in AI-generated content will be crucial to fostering trust and confidence in this rapidly evolving field. Striking the right balance between leveraging AI’s capabilities for positive advancements while guarding against potential risks remains a vital objective for the technology community.
The company warned about the challenges of reliably detecting all AI-written text. However, OpenAI emphasized the importance of good classifiers in addressing various problematic situations. These include countering false claims that AI-generated text was authored by a human, preventing automated misinformation campaigns, and curbing the use of AI tools for academic cheating.
Unfortunately, less than seven months later, the project was terminated. In a recent blog post on July 20, 2023, OpenAI stated, “The AI classifier is no longer available due to its low accuracy rate. We are actively incorporating feedback and currently researching more effective provenance techniques for text.”
While AI advancements have been remarkable, there are still challenges in detecting AI-generated content. Startups like GPTZero are working on this issue, but OpenAI, backed by Microsoft, is renowned for its expertise in the AI domain.
The online information landscape becomes more problematic as the lines between AI and human writing blur. Some spammy websites are utilizing new AI models to generate automated content, leading to misinformation and even false claims like the one from Bloomberg regarding Biden’s status.
AI Model Collapse: Potential Consequences and Mitigation Strategies
Beyond the journalistic concerns, there’s a more troubling possibility for the AI industry known as “AI Model Collapse.” This scenario arises when tech companies inadvertently use AI-produced data to train new models. Researchers fear that such models may deteriorate as they become reliant on their own automated content, causing a cycle of degradation.
A group of AI researchers from prestigious universities such as Oxford, Cambridge, and Toronto have been studying what happens when GPT-style AI model-generated text, like GPT-4, dominates the training dataset for subsequent models.
“It has to be taken seriously if we are to sustain the benefits of training from large-scale data scraped from the web,” they wrote. “Indeed, the value of data collected about genuine human interactions with systems will be increasingly valuable in the presence of content generated by LLMs in data crawled from the Internet.”
Addressing the challenge of distinguishing between human-generated and machine-generated content on the internet is of utmost importance. OpenAI was approached via email to inquire about their AI text classifier and the implications of Model Collapse. In response, a spokesperson concisely stated: “We have nothing to add outside of the update outlined in our blog post.”
To determine whether the spokesperson was human or not, they received a light-hearted remark in reply: “Hahaha, yes, I am very much a human, appreciate you for checking in though” Thankfully, the friendly response confirmed their humanity.