ChatGPT, the groundbreaking language model developed by OpenAI, has taken the tech world by storm with its impressive ability to generate text that often appears convincing. However, a recent study conducted by researchers at Brigham and Women’s Hospital, affiliated with Harvard Medical School, has unveiled a significant flaw in the AI’s capabilities. The study, published in the journal JAMA Oncology and reported by Bloomberg, highlights that cancer treatment plans generated by ChatGPT are riddled with errors.
**The Study’s Findings**
In an attempt to assess ChatGPT’s aptitude for creating treatment plans across various cancer cases, the researchers discovered a concerning pattern. Around one-third of the responses generated by the AI contained incorrect information, showcasing a substantial error rate. Additionally, the study found that ChatGPT often blended accurate and inaccurate information together, making it challenging to discern what was reliable. Surprisingly, out of 104 queries, nearly 98% of the AI’s responses contained at least one treatment recommendation aligned with the National Comprehensive Cancer Network guidelines.
The authors of the study were particularly alarmed by the manner in which ChatGPT intertwined factual and erroneous data, confounding even experts in the field. Coauthor Dr. Danielle Bitterman pointed out, “Large language models are trained to provide responses that sound very convincing, but they are not designed to provide accurate medical advice.” The study thus underscores the necessity to address the substantial error rate and inconsistency in responses, deeming them crucial safety concerns in the clinical domain.
**ChatGPT’s Rise and Limitations**
ChatGPT’s launch in November 2022 swiftly catapulted it into the limelight, amassing a staggering 100 million active users within two months. Its rapid success triggered a wave of investment in AI companies and sparked fervent debates about the long-term repercussions of artificial intelligence. Goldman Sachs research even predicted that AI could impact up to 300 million jobs worldwide.
Nonetheless, while ChatGPT’s popularity surged, it became apparent that generative AI models such as this are susceptible to “hallucinations.” These are instances where the AI confidently provides information that is not only misleading but blatantly incorrect. Notably, Google’s counterpart to ChatGPT, Bard, caused a $120 billion drop in the company’s stock value when it erroneously responded to a question about the James Webb space telescope.
The medical field has also been exploring ways to integrate AI to streamline administrative tasks. A recent study indicated that AI-based breast cancer screening was safe and could significantly reduce radiologists’ workload. Furthermore, a computer scientist at Harvard demonstrated that GPT-4, the latest iteration of the model, performed exceedingly well on the US medical licensing exam, even surpassing certain doctors in clinical judgment.
**Generative Models and Medical Accuracy**
Despite these advancements, the JAMA study underscores the accuracy limitations of generative AI models like ChatGPT, raising doubts about their feasibility as replacements for medical professionals. According to the research, approximately 12.5% of ChatGPT’s responses were classified as “hallucinated,” signifying the AI’s propensity to offer erroneous information. Moreover, the study identified that ChatGPT was most likely to provide inaccurate responses when queried about localized treatment for advanced diseases or immunotherapy.
OpenAI has acknowledged the unreliability of ChatGPT in critical medical contexts. The company’s terms of usage explicitly caution users that the models are not designed to furnish medical information and should not be utilized for diagnosing or treating severe medical conditions.
**Conclusion**
The study conducted by researchers at Brigham and Women’s Hospital serves as a poignant reminder that while AI models like ChatGPT have shown remarkable capabilities, they are far from infallible, especially in complex and critical domains such as cancer treatment planning. The fusion of accurate and inaccurate information within the generated responses underscores the need for rigorous refinement and validation before AI systems can be confidently integrated into clinical decision-making processes. As the debate over AI’s role in healthcare continues, this study highlights the imperative of addressing accuracy issues before such models can be considered for practical application.