• Send Us A Tip
  • Calling all Tech Writers
  • Advertise
Tuesday, June 16, 2026
  • Login
TechStory
  • News
  • Crypto
  • Gadgets
  • Memes
  • Gaming
  • Cars
  • AI
  • Startups
  • Markets
  • How to
No Result
View All Result
  • News
  • Crypto
  • Gadgets
  • Memes
  • Gaming
  • Cars
  • AI
  • Startups
  • Markets
  • How to
No Result
View All Result
TechStory
No Result
View All Result
Home Business

Reddit Blocks Internet Archive’s Wayback Machine Over AI Data Scraping Concerns

Move limits archiving of posts, comments, and profiles amid growing tensions with AI companies

by Harikrishnan A
August 12, 2025
in Business, Markets, News, Tech, Trending, World
Reading Time: 3 mins read
0
Surge in Reddit’s MOON Token Price Following Deflationary Strategy Unveil
TwitterWhatsappLinkedin

Reddit has moved to restrict the Internet Archive’s Wayback Machine from indexing most of its platform after identifying that some artificial intelligence companies have been using archived Reddit content for data scraping in ways that violate its policies.

You might also like

Best Laptops Under ₹70,000 in 2026: 9 Models That Offer the Most Value

Sarvam Joins the Unicorn Club: Why India’s Biggest AI Bet Comes at the Perfect Time

Labour Ministry Sets June 21 Deadline For Swiggy, Zomato, Uber To Register Gig Workers On eShram Portal

Under the new rules, the Wayback Machine will no longer be able to capture post pages, user comments, or profile information. Instead, its access will be limited to the Reddit.com homepage—showing only which headlines and posts were popular on any given day—effectively removing the ability to view detailed historical conversations or user activity.


Privacy and Policy at the Center of the Decision

Reddit has stated that while it values the Internet Archive’s role in preserving the open web, it has concerns about archived Reddit content being exploited without consent. The company argues that some archived material, such as deleted posts or user information, should not remain accessible if it has been removed from the platform.

The decision also comes with a privacy angle—Reddit wants stronger measures to ensure archived data respects platform rules and user protections. The company has signaled that the Internet Archive would need to adopt safeguards, such as ensuring removed content is not stored or displayed, before access could be restored.


Phased Rollout of Restrictions

Reddit began implementing the changes immediately, saying it notified the Internet Archive in advance. The company also acknowledged that it has previously raised concerns about the scraping of archived content from the Wayback Machine, particularly as AI companies have increasingly sought out large volumes of conversational data to train their models.

This latest move fits into Reddit’s wider strategy of controlling who can use its data and under what terms, especially as demand from AI developers accelerates.


Part of a Broader Anti-Scraping Push

In recent years, Reddit has taken multiple steps to limit free and unregulated scraping of its platform. In 2023, it struck a licensing deal with Google that gave the tech giant access to both search index data and information for AI training. Soon afterward, Reddit began blocking major search engines from crawling its content unless they agreed to pay for the privilege.

The company also made headlines with its controversial API policy changes in 2023, which dramatically increased fees for third-party developers. Those changes led to the shutdown of several popular Reddit apps and sparked widespread protests from users and moderators. At the time, Reddit cited misuse of its APIs by AI companies as a key reason for the overhaul.


Deals and Disputes With AI Companies

While Reddit has entered paid licensing agreements with certain AI firms, including OpenAI, it has also taken a more combative approach with others. In June, the company filed a lawsuit against Anthropic, claiming the AI firm continued to scrape Reddit content even after stating it had stopped.

These actions make clear Reddit’s position that AI companies must secure paid agreements rather than relying on indirect sources like public archives.


The Internet Archive’s Role

The Internet Archive is a nonprofit dedicated to preserving digital history, including websites, books, music, and other cultural materials. Its Wayback Machine tool allows users to view past versions of web pages, offering value to researchers, journalists, and the general public.

However, Reddit sees a risk in how this archive can unintentionally bypass its privacy protections. Content removed from Reddit—either by users or moderators—may still appear in the Wayback Machine, creating a loophole for anyone seeking to gather data that Reddit no longer wants available.


Ongoing Discussions but No Resolution Yet

The Internet Archive has acknowledged its long-standing relationship with Reddit and confirmed that discussions about the matter are ongoing. No specific agreement has been reached, and it remains unclear whether the Archive can adopt the privacy measures Reddit is demanding.

Even if changes are made, Reddit’s growing emphasis on monetizing its data could mean that any renewed access for archiving will come with tighter controls or licensing conditions.


AI’s Growing Demand for Online Content

This dispute reflects a broader trend across the internet: AI companies are increasingly running into resistance from platforms that host large volumes of user-generated content. Forums, social networks, and community-driven platforms are rich sources of human language and knowledge—key ingredients for training advanced AI models.

But unrestricted scraping raises complex legal and ethical issues. Many platforms now treat their data as a valuable asset, and more are demanding compensation for its use. This has led to a rise in licensing deals, alongside lawsuits targeting companies accused of scraping without authorization.

For most Reddit users, the immediate change may go unnoticed. However, journalists, historians, and digital archivists who depend on the Wayback Machine to review past discussions could find their work significantly affected.

Supporters of Reddit’s move argue that it is necessary to protect user privacy and ensure responsible use of online data. Critics, however, warn that limiting access to public archives could undermine transparency and weaken the preservation of digital history.

Tags: AI scrapingAnthropic lawsuitdata privacydigital preservationGenerative AIInternet Archiveonline content controlRedditWayback Machineweb archiving
Tweet55SendShare15
Previous Post

Adda247 Secures $35 Mn to Boost Vernacular Edtech for Government Exams

Next Post

Elon Musk’s Grok Chatbot Faces Brief Suspension Amid Controversial Responses

Harikrishnan A

Aspiring writer. Enjoys gaming, fried chicken and iced tea, preferably all together.

Recommended For You

Best Laptops Under ₹70,000 in 2026: 9 Models That Offer the Most Value

by Thomas Babychan
June 15, 2026
0
Best Laptops Under ₹70,000 in 2026: 9 Models That Offer the Most Value

Buying a laptop has become a more complicated decision than it was just a few years ago. The market is packed with machines that promise faster processors, longer...

Read more

Sarvam Joins the Unicorn Club: Why India’s Biggest AI Bet Comes at the Perfect Time

by Ishaan Negi
June 15, 2026
0
Sarvam Joins the Unicorn Club: Why India’s Biggest AI Bet Comes at the Perfect Time

India's artificial intelligence race has found a new champion. Bengaluru-based AI startup Sarvam has officially entered the unicorn club after raising $234 million in the first close of...

Read more

Labour Ministry Sets June 21 Deadline For Swiggy, Zomato, Uber To Register Gig Workers On eShram Portal

by Rounak Majumdar
June 15, 2026
0
Labour Ministry Sets June 21 Deadline For Swiggy, Zomato, Uber To Register Gig Workers On eShram Portal

India's gig economy is taking a major step toward formal recognition. India's gig economy is moving closer to formal recognition as the government has reportedly asked major digital...

Read more
Next Post
Elon Musk’s Grok AI Faces Renewed Backlash Over Non-Consensual Deepfake Images of Taylor Swift

Elon Musk’s Grok Chatbot Faces Brief Suspension Amid Controversial Responses

Please login to join discussion

Techstory

Tech and Business News from around the world. Follow along for latest in the world of Tech, AI, Crypto, EVs, Business Personalities and more.
reach us at info@techstory.in

Advertise With Us

Reach out at - info@techstory.in

Aviator Game India 2026

BROWSE BY TAG

#Crypto #howto 2024 acquisition AI amazon Apple Artificial Intelligence bitcoin Business China cryptocurrency e-commerce electric vehicles Elon Musk Ethereum facebook funding Gaming Google India Instagram Investment ios iPhone IPO Market Markets Meta Microsoft News OpenAI samsung Social Media SpaceX startup startups tech technology Tesla TikTok trend trending twitter US

© 2025 Techstory.in

No Result
View All Result
  • News
  • Crypto
  • Gadgets
  • Memes
  • Gaming
  • Cars
  • AI
  • Startups
  • Markets
  • How to

© 2025 Techstory.in

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?