• Send Us A Tip
  • Calling all Tech Writers
  • Advertise
Wednesday, July 9, 2025
  • Login
TechStory
  • News
  • Crypto
  • Gadgets
  • Memes
  • Gaming
  • Cars
  • AI
  • Startups
  • Markets
  • How to
No Result
View All Result
  • News
  • Crypto
  • Gadgets
  • Memes
  • Gaming
  • Cars
  • AI
  • Startups
  • Markets
  • How to
No Result
View All Result
TechStory
No Result
View All Result
Home Future Tech AI

Investigation Reveals Companies Including Apple Used YouTube Content Without Consent To Train AI Models

by Harikrishnan A
July 17, 2024
in AI, Business, Entertainment, Markets, News, Tech, Trending, World
Reading Time: 3 mins read
0
Investigation Reveals Companies Including Apple Used YouTube Content Without Consent To Train AI Models
TwitterWhatsappLinkedin

In a recent investigation by Proof News and Wired, it has been uncovered that some of the largest AI companies globally have been acquiring data from thousands of YouTube videos to train their AI models, despite YouTube’s explicit prohibition on unauthorized data extraction.

You might also like

Grok 4 Launching Today, Musk Confirms

Passkeys – How Secfense Is Reinventing Workforce Authentication?

New Academic Studies Allege Uber’s Opaque Algorithm Systematically Boosts Profits at Expense of Drivers and Passengers

Extensive Data Collection

The investigation revealed that subtitles from 173,536 YouTube videos, sourced from over 48,000 channels, were utilized by prominent Silicon Valley entities such as Anthropic, Nvidia, Apple, and Salesforce. This dataset, named YouTube Subtitles, encompasses transcripts from educational channels like Khan Academy, MIT, Harvard, and major media outlets including The Wall Street Journal, NPR, and the BBC. Even entertainment programs such as “The Late Show With Stephen Colbert,” “Last Week Tonight With John Oliver,” and “Jimmy Kimmel Live” contributed to this extensive dataset.

Participation of Influential YouTubers

High-profile YouTube personalities also unwittingly contributed to these AI training efforts. Notably, videos from MrBeast (289 million subscribers), Marques Brownlee (19 million subscribers), Jacksepticeye (nearly 31 million subscribers), and PewDiePie (111 million subscribers) were incorporated into the training dataset. Some of this content even propagated controversial narratives like the “flat-Earth theory.”

Proof News developed a specialized tool enabling content creators to identify if their videos were included in the AI training datasets derived from YouTube. Companies involved in AI development often utilized “the Pile,” a compilation curated by EleutherAI, initially intended to democratize AI training resources but subsequently leveraged by major tech corporations.

Creators Respond to Unauthorized Usage

David Pakman, host of “The David Pakman Show,” expressed dismay upon discovering that nearly 160 of his videos were utilized without consent. Pakman emphasized the need for AI companies to compensate creators whose content underpins their technological advancements, underscoring the significant investments of time, effort, and financial resources involved in content creation.

“This is my livelihood, and I invest considerable resources in producing this content,” Pakman emphasized. “There’s no shortage of work that goes into it.”

Dave Wiskus, CEO of Nebula, voiced strong objections, condemning the unauthorized use of creators’ content as “theft” and highlighting concerns over AI potentially displacing artists and their livelihoods.

“Will this exploit and harm artists? Absolutely,” Wiskus asserted.

Julia Walsh, CEO of Complexly, a company producing educational content like SciShow, echoed frustrations over the exploitation of meticulously crafted materials without consent.

Legal and Ethical Implications

The practice of using YouTube content for AI training raises profound ethical and legal concerns, particularly regarding YouTube’s terms of service prohibiting automated data extraction. Sid Black, founder of EleutherAI, acknowledged employing scripts to download captions via YouTube’s API, likening this process to conventional web browsing methods.

Anthropic defended its practices, asserting compliance with terms of service and downplaying the significance of using YouTube Subtitles within the broader Pile dataset. However, Google refrained from detailed comments on specific cases, citing ongoing efforts to prevent unauthorized data scraping.

Industry Reflections and Responses

In a recent interview, Google CEO Sundar Pichai underscored that utilizing YouTube videos for training AI models, such as OpenAI’s Sora, could potentially violate YouTube’s terms of service, albeit distinct from direct video content scraping.

EleutherAI, the organization behind the Pile dataset, did not respond to requests for comment, reiterating its mission to democratize access to cutting-edge AI technologies. The controversy surrounding AI data acquisition highlights evolving issues in data usage ethics and legality.

Marques Brownlee acknowledged the complexities involved, noting Apple’s indirect sourcing of AI data from companies that had scraped YouTube content, including his own.

“Apple sourced data for their AI from companies that scraped extensive data from YouTube, including mine,” Brownlee observed. “This presents an ongoing challenge.”

As AI development progresses, the industry faces continuing dilemmas regarding data acquisition, consent, and fair compensation for content creators.

Tags: AIAnthropicAppleMKBHDNvidiaYoutube
Tweet54SendShare15
Previous Post

Microsoft Faces Backlash After Disbanding DEI Team

Next Post

Breaking News: OpenAI Just Dropped 2 New Sora Videos, Transforming Digital Filmmaking

Harikrishnan A

Aspiring writer. Enjoys gaming, fried chicken and iced tea, preferably all together.

Recommended For You

Grok 4 Launching Today, Musk Confirms

by Sneha Singh
July 9, 2025
0
Grok 4 Launching Today, Musk Confirms

Elon Musk has officially revealed that xAI's much-awaited Grok 4 model will debut today, July 9, with a live-stream launch scheduled at 8 PM Pacific Time. The confirmation...

Read more

Passkeys – How Secfense Is Reinventing Workforce Authentication?

by Rohan Mathawan
July 9, 2025
0
Passkeys – How Secfense Is Reinventing Workforce Authentication?

What Are Passkeys — And Why They Matter Passkeys rely on public‑key cryptography - a private key is securely stored on a device, while a matching public key...

Read more

New Academic Studies Allege Uber’s Opaque Algorithm Systematically Boosts Profits at Expense of Drivers and Passengers

by Anochie Esther
July 9, 2025
0
Uber

Uber, the global ride-hailing giant, is once again facing intense scrutiny over its pricing practices. Two independent academic studies, one from the prestigious Columbia Business School in the...

Read more
Next Post
OpenAI researchers warn of 'catastrophic harm' after the company opposes the AI safety bill

Breaking News: OpenAI Just Dropped 2 New Sora Videos, Transforming Digital Filmmaking

Please login to join discussion

Techstory

Tech and Business News from around the world. Follow along for latest in the world of Tech, AI, Crypto, EVs, Business Personalities and more.
reach us at [email protected]

Advertise With Us

Reach out at - [email protected]

BROWSE BY TAG

#Crypto #howto 2024 acquisition AI amazon Apple bitcoin Business China cryptocurrency e-commerce electric vehicles Elon Musk Ethereum facebook flipkart funding Gaming Google India Instagram Investment ios iPhone IPO Market Markets Meta Microsoft News NFT samsung Social Media SpaceX startup startups tech technology Tesla TikTok trend trending twitter US

© 2024 Techstory.in

No Result
View All Result
  • News
  • Crypto
  • Gadgets
  • Memes
  • Gaming
  • Cars
  • AI
  • Startups
  • Markets
  • How to

© 2024 Techstory.in

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?