Apple Trained AI Models on YouTube Content Without Consent, Sparking Controversy

A recent report reveals that major tech companies, including Apple, have trained AI models using YouTube videos without obtaining creators’ consent. Apple trained AI models on YouTube content without consent, raising serious copyright issues for creators. They reportedly used subtitle files downloaded by a third party from over 170,000 videos. Notable creators affected include Marquees Brownlee (MKBHD), MrBeast, PewDiePie, Stephen Colbert, John Oliver, and Jimmy Kimmel.

Anicut Capital Launches ₹175 Crore Seed Fund to Back Early-Stage Startups

Zepto Delays IPO, Plans Rs 1,000 Crore Pre-IPO Fundraise

AI Infrastructure Escalation US Residents Furious at Plan to Seize and Destroy Homes for Data Center Power Lines

According to Wired, an investigation by Proof News uncovered that some of the world’s wealthiest AI companies utilized material from thousands of YouTube videos to train their AI models. This action was taken despite YouTube’s strict rules against extracting materials from its platform without permission.

Proof News discovered that subtitle files from 173,536 YouTube videos, sourced from more than 48,000 channels, were used by prominent tech firms like Anthropic, Nvidia, Apple, and Salesforce. The data was reportedly downloaded by EleutherAI, a non-profit organization that supports AI model development.

The Pile Dataset

EleutherAI’s research paper mentions that the subtitle files were included in a dataset called the Pile, which the organization released. This dataset is accessible to anyone with sufficient storage and computing power. While it was intended to assist small developers and academics, it was also utilized by major tech companies, including Apple.

Research papers and posts from companies such as Apple, Nvidia, and Salesforce reveal their use of the Pile to train AI models. Apple specifically used the Pile to train OpenELM, a high-profile model released in April. This was just weeks before Apple announced new AI capabilities for iPhones and MacBooks.

Legal and Ethical Concerns

The practice where Apple trained AI models on YouTube content without consent has drawn criticism from various quarters. The situation raises significant legal and ethical concerns. While Apple and other companies likely used the publicly available dataset in good faith, the legal complexities of web scraping for AI training are highlighted by this case.

One major issue is the use of copyrighted content without permission. YouTube creators like Marquees Brownlee earn income from ads on their videos. Using their content without consent and compensation is akin to a copyright violation. Existing copyright laws, established by the Berne Convention in 1971, are outdated for addressing modern technology like AI.

Copyright laws traditionally cover derivative works, such as movies based on novels. However, AI training on vast amounts of text presents a more distant and complex connection. There is ongoing debate about whether existing copyright protections should extend to AI training data.

This incident underscores the legal and ethical challenges in using web-scraped content for AI training. The use of datasets compiled by third parties without proper permissions creates a potential minefield for companies, highlighting the need for clearer regulations in the digital age.

Analysis of AI Training with YouTube Content

Recent reports show that tech giants like Apple used YouTube content to train AI models without creators’ consent. This practice has stirred controversy, highlighting critical copyright and ethical issues.

Using subtitle files from over 170,000 YouTube videos raises serious copyright concerns. Creators like Marquees Brownlee were affected when Apple trained AI models on YouTube content without consent. Using their work without permission is like stealing, as it takes away their potential earnings and violates their intellectual property rights.

Current copyright laws, set in 1971, are outdated. They were made for a time before the internet and AI. Today, AI training involves huge amounts of data, making the connection between the original content and the AI’s output unclear. This situation questions whether current copyright protections should cover AI training data.

There is also an ethical issue. YouTube’s rules forbid extracting content without permission. EleutherAI, a non-profit organization, downloaded the subtitles, seemingly violating these rules. Although their aim was to help small developers and academics, tech giants ended up using this data.

Also Read: New Startup’s Chip, Sohu, is 20 Times Faster in Running Transformers Like ChatGPT.

Apple Trained AI Models on YouTube Content Without Consent, Sparking Controversy

Anicut Capital Launches ₹175 Crore Seed Fund to Back Early-Stage Startups

Zepto Delays IPO, Plans Rs 1,000 Crore Pre-IPO Fundraise

AI Infrastructure Escalation US Residents Furious at Plan to Seize and Destroy Homes for Data Center Power Lines

Get your Must-Have Tech and Everyday Essentials for under $25 dollars at Amazon Prime Day

Prime Day Laptop Deal: This 14 Inch HP Laptop is Available Just at $249.99!

Reshab Agarwal

Recommended For You

Anicut Capital Launches ₹175 Crore Seed Fund to Back Early-Stage Startups

Zepto Delays IPO, Plans Rs 1,000 Crore Pre-IPO Fundraise

AI Infrastructure Escalation US Residents Furious at Plan to Seize and Destroy Homes for Data Center Power Lines

Prime Day Laptop Deal: This 14 Inch HP Laptop is Available Just at $249.99!

Techstory

Advertise With Us

Aviator Game India 2026

Welcome Back!

Retrieve your password

Are you sure want to unlock this post?

Are you sure want to cancel subscription?

Apple Trained AI Models on YouTube Content Without Consent, Sparking Controversy

You might also like

The Pile Dataset

Legal and Ethical Concerns

Analysis of AI Training with YouTube Content

Get your Must-Have Tech and Everyday Essentials for under $25 dollars at Amazon Prime Day

Prime Day Laptop Deal: This 14 Inch HP Laptop is Available Just at $249.99!

Recommended For You

Techstory

Advertise With Us

BROWSE BY TAG

Welcome Back!

Retrieve your password

Are you sure want to unlock this post?

Are you sure want to cancel subscription?