AI or humans: who is behind the errors?

Lexus RX Keeps the Crown as America’s Bestselling Luxury SUV in 2026

Chinese AI Startup StepFun Launches World’s First Agentic Smartphone StepX Neo With Built-In Amoo AI Agent

The Grid Lockdown New Orleans Enacts Emergency Moratorium on Data Centers

There have been earlier discussions about the possibility of human flaws being superimposed on the AI systems in the bid to make the AI imitate humans. However, this also directs the possibility of human errors being reflected on the AI output. Recently, a few wrongly labelled images surfaced on the internet wherein a baby was labelled as a nipple, a pizza called a dough, and a swim suit which was ridiculously identified as a bra. At first glance, a matter to laugh off but once you delve deeper into it, there surfaces the inherent problem of mislabeling.

It was recently discovered by a team of MIT researchers that over 3% data in machine learning systems has been wrongly labelled. After inspecting about ten major data sets pertaining to machine learning, the researchers are positive about the fact that about 3.4% of the available data used in artificial intelligence machine learning systems is subject to mislabeling.

The errors range from Amazon and IMDB reviews which are actually negative being labelled as positive and image-based tagging which leads to incorrect identification of the subject, in addition to video based errors.

According to the researchers,

“We identify label errors in the test sets of 10 of the most commonly used computer vision, natural language, and audio datasets, and subsequently study the potential for these errors to affect bench mark results. Errors in the test sets are numerous and widespread: we estimate an average of 3.4% errors across 10 datasets, where for example 2916 label errors comprise 6% of the ImageNet validation set.”

The research paper is titled ‘Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks’

The mislabeling will have far reaching implications if it is not addressed and solved effectively. It can even have an impact on trust in AI systems. This is because incorrectly labelled datasets leads to the AI learning the wrong identification and knowledge, which in turn will pose a challenge for the AI for delivering accurate results. The researchers recommend the use of lower capacity models over higher capacity models, since they tend to have high proportions of wrongly labelled data.

Tags: Artificial Intelligence Future Tech

AI or humans: who is behind the errors?

Lexus RX Keeps the Crown as America’s Bestselling Luxury SUV in 2026

Chinese AI Startup StepFun Launches World’s First Agentic Smartphone StepX Neo With Built-In Amoo AI Agent

The Grid Lockdown New Orleans Enacts Emergency Moratorium on Data Centers

Reports Says Future iPhones Will Be Capable To Detect User Touch Even Through Gloves

Google AI Introduces a New System for Open-Domain Long-Form Question Answering (LFQA)

Sandra Theres Dony

Recommended For You

Lexus RX Keeps the Crown as America’s Bestselling Luxury SUV in 2026

Chinese AI Startup StepFun Launches World’s First Agentic Smartphone StepX Neo With Built-In Amoo AI Agent

The Grid Lockdown New Orleans Enacts Emergency Moratorium on Data Centers

Google AI Introduces a New System for Open-Domain Long-Form Question Answering (LFQA)

Techstory

Advertise With Us

Aviator Game India 2026

Welcome Back!

Retrieve your password

Are you sure want to unlock this post?

Are you sure want to cancel subscription?

AI or humans: who is behind the errors?

You might also like

Reports Says Future iPhones Will Be Capable To Detect User Touch Even Through Gloves

Google AI Introduces a New System for Open-Domain Long-Form Question Answering (LFQA)

Recommended For You

Techstory

Advertise With Us

BROWSE BY TAG

Welcome Back!

Retrieve your password

Are you sure want to unlock this post?

Are you sure want to cancel subscription?