• Send Us A Tip
  • Calling all Tech Writers
  • Advertise
Monday, May 19, 2025
  • Login
  • Register
TechStory
  • News
  • Crypto
  • Gadgets
  • Memes
  • Gaming
  • Cars
  • AI
  • Startups
  • Markets
  • How to
No Result
View All Result
  • News
  • Crypto
  • Gadgets
  • Memes
  • Gaming
  • Cars
  • AI
  • Startups
  • Markets
  • How to
No Result
View All Result
TechStory
No Result
View All Result
Home Tech

OpenAI’s New AI Models Face Troubling Increase in Hallucinations

by Sneha Singh
April 20, 2025
in Tech
Reading Time: 3 mins read
0
OpenAI's New AI Models Face Troubling Increase in Hallucinations
TwitterWhatsappLinkedin

Tech giant OpenAI has hit an unexpected roadblock with its latest artificial intelligence models. The company’s new reasoning models, o3 and o4-mini, are showing a concerning spike in hallucination rates – essentially making up information that isn’t true – compared to their predecessors.

You might also like

Mohini Mohan Dutta to Inherit ₹588 Crore as Estate Proceedings Move Forward

Colorbar Cosmetics Sets Sights on IPO in 2027 After Doubling Revenue Goals

TikTok ‘Chromebook Challenge’ Sparks Safety Scare in U.S. Schools

This development has stunned both the company’s own engineers and industry watchers alike, as it reverses years of steady improvement in AI reliability. While each previous generation of OpenAI’s large language models had been getting gradually better at avoiding hallucinations, these new models are suddenly performing worse.

A Step Backward in AI Reliability

According to OpenAI’s internal testing, the new o3 model hallucinated in 33% of cases on the company’s PersonQA benchmark. That’s roughly double the rate of previous models like o1 (16%) and o3-mini (14.8%). Even more troubling, the o4-mini model performed worse still, hallucinating in nearly half of all test cases – a staggering 48%.

This setback has raised serious concerns throughout the AI research community. When AI systems confidently present false information as fact, it undermines user trust and limits how these technologies can be safely used in important applications.

OpenAI Takes Aim at 'Hallucinations' as More Businesses Integrate AI
Credits: PYMNTS.com

“What we’re seeing is unusual for a company that has built its reputation on steady, measurable progress in AI safety,” said tech analyst Sarah Chen. “These hallucination rates could potentially undermine years of work building public trust in AI systems.”

Mystery Behind the Decline

Perhaps most concerning is that OpenAI itself doesn’t fully understand why this is happening. In its technical documentation, the company openly admits that “more research is needed” to figure out why scaling up these reasoning models is leading to more frequent hallucinations.

Neil Chowdhury, a researcher at nonprofit AI lab Transluce and former OpenAI employee, suggests the reinforcement learning methods used in developing these models might be amplifying problems that older techniques managed to avoid. His team found that o3 sometimes makes up not just facts, but even fabricates actions it claims to have taken – like pretending to run code on hardware that doesn’t exist.

“It’s as if the models are becoming more confident but not necessarily more accurate,” Chowdhury explained. “They’re generating more claims overall, which means both more correct answers and more incorrect ones.”

Despite these issues, the new models do excel in certain areas. The o3 model achieved an impressive 69.1% score on the SWE-bench coding benchmark, with o4-mini close behind at 68.1%. These are significant improvements in coding and mathematical capabilities.

However, the practical problems are already evident. Kian Katanforoosh, Stanford adjunct professor and CEO of startup Work, noted that while o3 performs exceptionally well for coding tasks compared to competitors, it frequently generates broken website links – URLs that simply don’t exist.

“For businesses relying on these models, such hallucinations can be more than just annoying – they can actively harm productivity and decision-making,” Katanforoosh said. “Imagine building a product roadmap based on AI research that includes references to non-existent studies or tools.”

Industry Impact and Future Challenges

This spike in hallucination rates comes at a crucial moment for OpenAI, which faces intense competition from rivals like Google, Meta, xAI, Anthropic, and DeepSeek. The company had been counting on these new reasoning models to set a new industry standard, but the unexplained rise in hallucinations could damage user trust.

AI ethics researcher Maya Johnson points out the fundamental challenge: “While some creative ‘hallucination’ can be useful for brainstorming or generating novel ideas, these rates are simply too high for enterprise or scientific applications where accuracy is non-negotiable.”

OpenAI has acknowledged the seriousness of the issue and is dedicating resources to understanding and addressing the root causes. The company has also called on the broader AI research community to help investigate this phenomenon.

As the race for more capable AI continues, this development serves as a sobering reminder that as models grow more sophisticated in some ways, they may simultaneously struggle with basic reliability problems. For now, users of these advanced models may need to exercise extra caution and verification when working with their outputs.

Tags: AnrthopicChatGPTDeepSeekGoogleMetaOpenAI
Tweet64SendShare18
Previous Post

How to get Blue Vida in MLB The Show 25?

Next Post

AI Declares Trump’s Reported Physical Results “Virtually Impossible”

Sneha Singh

Sneha is a skilled writer with a passion for uncovering the latest stories and breaking news. She has written for a variety of publications, covering topics ranging from politics and business to entertainment and sports.

Recommended For You

Mohini Mohan Dutta to Inherit ₹588 Crore as Estate Proceedings Move Forward

by Ishaan Negi
May 19, 2025
0
Mohini Mohan Dutta to Inherit ₹588 Crore as Estate Proceedings Move Forward

In a significant legal and emotional milestone, Mohini Mohan Dutta — a long-standing associate and close confidant of the late industrialist Ratan Tata — has formally consented to...

Read more

Colorbar Cosmetics Sets Sights on IPO in 2027 After Doubling Revenue Goals

by Ishaan Negi
May 19, 2025
0
Colorbar Cosmetics Sets Sights on IPO in 2027 After Doubling Revenue Goals

Colorbar Cosmetics, one of India’s oldest homegrown beauty brands, is gearing up for a public debut in early 2027. With a strong focus on innovation, store expansion, and...

Read more

TikTok ‘Chromebook Challenge’ Sparks Safety Scare in U.S. Schools

by Harikrishnan A
May 19, 2025
0
TikTok ‘Chromebook Challenge’ Sparks Safety Scare in U.S. Schools

A dangerous TikTok trend has put students, schools, and first responders on high alert. Known as the “Chromebook Challenge,” the viral stunt encourages kids to insert metallic or...

Read more
Next Post
AI Declares Trump's Reported Physical Results "Virtually Impossible"

AI Declares Trump's Reported Physical Results "Virtually Impossible"

Please login to join discussion

Techstory

Tech and Business News from around the world. Follow along for latest in the world of Tech, AI, Crypto, EVs, Business Personalities and more.
reach us at [email protected]

Advertise With Us

Reach out at - [email protected]

BROWSE BY TAG

#Crypto #howto 2024 acquisition AI amazon Apple bitcoin Business China cryptocurrency e-commerce electric vehicles Elon Musk Ethereum facebook flipkart funding Gaming Google India Instagram Investment ios iPhone IPO Market Markets Meta Microsoft News NFT samsung Social Media SpaceX startup startups tech technology Tesla TikTok trend trending twitter US

© 2024 Techstory.in

No Result
View All Result
  • News
  • Crypto
  • Gadgets
  • Memes
  • Gaming
  • Cars
  • AI
  • Startups
  • Markets
  • How to

© 2024 Techstory.in

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?