• Send Us A Tip
  • Calling all Tech Writers
  • Advertise
Monday, June 22, 2026
  • Login
TechStory
  • News
  • Crypto
  • Gadgets
  • Memes
  • Gaming
  • Cars
  • AI
  • Startups
  • Markets
  • How to
No Result
View All Result
  • News
  • Crypto
  • Gadgets
  • Memes
  • Gaming
  • Cars
  • AI
  • Startups
  • Markets
  • How to
No Result
View All Result
TechStory
No Result
View All Result
Home Business

Meta’s New Web Crawlers Raise Concerns Over Data Collection

by Harikrishnan A
August 21, 2024
in Business, Markets, News, Tech, Trending, World
Reading Time: 3 mins read
0
Meta is Using Your Instagram and Facebook Photos for AI Training
TwitterWhatsappLinkedin

Meta, formerly Facebook, has recently launched a pair of web crawlers, the Meta External Agent and Meta External Fetcher, that are stirring controversy among website owners and industry experts. These bots are designed to gather data from across the internet to enhance Meta’s AI models and other products, but their sophisticated data collection methods have raised serious privacy concerns.

You might also like

Why Am I Not Getting More TikTok Followers? 10 Reasons And Fixes

China Forces Meta To Give Back Manus AI At $2 Billion As Original Investors Plan Buyback And Hong Kong Listing

Porsche Taycan Wagons Bow Out in the US as Sport Turismo and Cross Turismo Face the Axe

New Bots with Advanced Capabilities

The Meta External Agent, introduced last month, is programmed to harvest publicly available information from a wide range of online sources. This includes news articles, online forums, and other types of public content. The data collected by this bot is used to train AI models, helping Meta refine its products and services.

Alongside this, Meta has deployed the Meta External Fetcher, which focuses on collecting web links to support the company’s AI assistant tools. Together, these bots are integral to Meta’s strategy for advancing its AI technology.

Comparing Meta’s Bots to Industry Peers

Meta’s new bots are reminiscent of those used by other tech giants like OpenAI, whose GPTBot also scrapes the web for AI training data. According to Dark Visitors, a company that tracks web scrapers, Meta’s bots function similarly to OpenAI’s tools. Both are designed to gather extensive online data, crucial for developing effective AI systems.

However, Meta’s bots are equipped with advanced features that make them harder for website owners to block. This has led to increased unease among content creators who are concerned about their data being harvested without their permission.

The Challenge of Blocking Web Scrapers

For decades, website owners have used the `robots.txt` file to restrict automated bots from accessing their content. This protocol has been a standard method for managing web scraping activities. Yet, the increasing demand for high-quality data has led some companies to ignore or bypass these rules.

In recent months, it was revealed that OpenAI and Anthropic have found ways to circumvent `robots.txt` restrictions, highlighting potential vulnerabilities in this system. Meta’s new bots also challenge this protocol. The Meta External Fetcher, in particular, is reported to potentially bypass `robots.txt` rules, complicating efforts by website owners to prevent unwanted data collection.

Moreover, the Meta External Agent combines data collection and content indexing into one bot, making it more difficult for website administrators to block specific functions without impacting others.

Industry Reactions and Concerns

The rollout of Meta’s new bots has sparked a debate about the ethics of large-scale data scraping for AI training. Jon Gillham, CEO of Originality.ai, a firm that identifies AI-generated content, voiced concerns about the limited options available for website owners. He stressed the need for companies like Meta to offer ways for content creators to control how their data is used while still allowing their sites to be visible to users.

Current data shows that only a small fraction of top websites have successfully blocked Meta’s new bots. Approximately 1.5% have blocked the Meta External Agent, and less than 1% have blocked the Meta External Fetcher. In contrast, Meta’s older crawler, FacebookBot, has been blocked by about 10% of major websites, indicating that the new bots are more adept at avoiding detection.

Meta’s Response to Criticisms

In response to these concerns, Meta has stated its commitment to providing website owners with more control over their data. A Meta spokesperson assured that the company is working to make it easier for publishers to manage their content in relation to AI training. This includes allowing web administrators to choose which bots to block.

Despite these assurances, the rapid advancement of AI web crawlers continues to raise questions about data privacy and content ownership. As Meta and other tech giants, including Google and Anthropic, advance their AI technologies, there is an urgent need for clearer guidelines and protections for website owners.

Tags: #llamaChatGPTfacebookInstagramMetaWeb Crawling
Tweet56SendShare16
Previous Post

Reshamandi fires full staff, shuts website down due to financial struggles

Next Post

Car Companies Are Selling Your Driving Data Without Your Consent

Harikrishnan A

Aspiring writer. Enjoys gaming, fried chicken and iced tea, preferably all together.

Recommended For You

Why Am I Not Getting More TikTok Followers? 10 Reasons And Fixes

by Rohan Mathawan
June 22, 2026
0
Why Am I Not Getting More TikTok Followers? 10 Reasons And Fixes

Posting often but still seeing the same follower count can feel confusing. You may have good videos, but small gaps can slow growth. This guide explains why I...

Read more

China Forces Meta To Give Back Manus AI At $2 Billion As Original Investors Plan Buyback And Hong Kong Listing

by Rounak Majumdar
June 22, 2026
0
China Forces Meta To Give Back Manus AI At $2 Billion As Original Investors Plan Buyback And Hong Kong Listing

One of the most consequential deals in the global AI industry is being reversed by government order. The early Chinese backers of AI startup Manus are planning to...

Read more

Porsche Taycan Wagons Bow Out in the US as Sport Turismo and Cross Turismo Face the Axe

by Samir Gautam
June 22, 2026
0
Porsche Taycan Wagons Discontinued in the US After 2026

Porsche is preparing to shrink the Taycan family in the United States, confirming that the Sport Turismo and Cross Turismo variants will be discontinued after the 2026 model...

Read more
Next Post
Car Companies Are Selling Your Driving Data Without Your Consent

Car Companies Are Selling Your Driving Data Without Your Consent

Please login to join discussion

Techstory

Tech and Business News from around the world. Follow along for latest in the world of Tech, AI, Crypto, EVs, Business Personalities and more.
reach us at info@techstory.in

Advertise With Us

Reach out at - info@techstory.in

Aviator Game India 2026

BROWSE BY TAG

#Crypto #howto 2024 acquisition AI amazon Apple Artificial Intelligence bitcoin Business China cryptocurrency e-commerce electric vehicles Elon Musk Ethereum facebook funding Gaming Google India Instagram Investment ios iPhone IPO Market Markets Meta Microsoft News OpenAI samsung Social Media SpaceX startup startups tech technology Tesla TikTok trend trending twitter US

© 2025 Techstory.in

No Result
View All Result
  • News
  • Crypto
  • Gadgets
  • Memes
  • Gaming
  • Cars
  • AI
  • Startups
  • Markets
  • How to

© 2025 Techstory.in

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?