Researchers report that OpenAI’s Whisper transcription tool has hallucination issues, with fabricated details that can misrepresent what speakers actually say. OpenAI’s Whisper, a widely-used AI transcription tool, is raising concerns across tech and healthcare industries due to reports of inaccuracies and “hallucinations” in its transcriptions. Whisper, promoted as having near-human accuracy, has instead added inaccurate or entirely fabricated content to transcripts, according to an investigation by the Associated Press.
Reports indicate that Whisper’s “hallucinations” may include inappropriate racial commentary, invented medical information, and violent phrases, all without the original speaker’s intention. The issue is drawing attention, especially as hospitals and businesses rely on Whisper in critical transcription settings.
Whisper’s use in medical contexts is especially concerning. Health systems are beginning to rely on Whisper-based tools to transcribe patient-doctor interactions, despite OpenAI’s clear warning that it should not be used in high-stakes decision-making. The lack of access to original recordings, such as in Nabla’s Whisper-based tool, prevents healthcare providers from cross-checking transcriptions for accuracy.
Furthermore, Whisper’s inaccuracies pose risks to the Deaf and hard-of-hearing communities, who depend on transcription services for accessible communication. These users may unknowingly encounter fabricated details that could affect their understanding of critical content.
Inaccurate Transcriptions: A Systemic Issue
Software engineers, developers, and academic researchers have highlighted Whisper’s tendency to invent content during transcriptions. Researchers from the University of Michigan found hallucinations in 80% of Whisper’s audio transcriptions during a study on public meetings. Another machine learning expert reported inaccuracies in over half of Whisper’s 100-hour transcriptions reviewed. A third developer encountered hallucinations in nearly every one of the 26,000 transcriptions he analyzed.
The issue persists even in short, well-recorded audio, with another study by computer scientists finding 187 hallucinations within a set of 13,000 clips. Given Whisper’s broad integration in services from consumer technologies to professional settings, this inaccuracy could affect millions of users globally, potentially leading to serious misinterpretations.
Risky Use in Healthcare Despite Warnings
Despite OpenAI’s recommendations against using Whisper in “high-stakes decision-making contexts,” healthcare providers are increasingly using it to document patient visits. Over 30,000 clinicians and 40 health systems, including Minnesota’s Mankato Clinic and Children’s Hospital Los Angeles, use Whisper-based tools. Nabla, a U.S.- and France-based company, developed Whisper’s medical application to summarize patient-doctor interactions, but challenges persist. Nabla’s tool does not store the original audio for data safety, complicating the process of verifying transcription accuracy.
Privacy experts have also raised concerns regarding Whisper’s use in healthcare settings. California Assemblymember Rebecca Bauer-Kahan refused to sign a form authorizing the sharing of medical audio with companies like Microsoft Azure. She voiced concerns about the appropriateness of allowing for-profit companies access to private health information.
Potential Impact on the Deaf and Hard of Hearing
The Deaf and hard-of-hearing communities, who often rely on Whisper-based closed captioning for access, are particularly affected. For the Deaf and hard-of-hearing community, OpenAI’s Whisper transcription tool has hallucination issues, which might result in unverified information within closed captions. Christian Vogler, director of the Technology Access Program at Gallaudet University, noted that misinterpretations embedded within captioning texts make it difficult to discern accuracy. Without the ability to check against audio, these users are at risk of assuming incorrect information.
The high rate of hallucinations in Whisper’s outputs has prompted calls for stricter regulations. Former OpenAI engineer William Saunders expressed concern over Whisper’s reliability, stating that improvements are needed to prevent overconfidence in the technology. Researchers and advocates are urging for regulatory oversight and further development to ensure accuracy and safety in critical applications.
Many developers have observed that OpenAI’s Whisper transcription tool has hallucination issues, creating concerns about its reliability. In response, OpenAI stated it is continually working to enhance Whisper’s accuracy. An OpenAI spokesperson thanked researchers for sharing their findings and emphasized the company’s commitment to minimizing hallucinations in future model updates.
Also Read: OpenAI Confirms: Highly Anticipated Orion AI Model Won’t Launch This Year.