Reddit CEO Steve Huffman is pushing for compensation from major tech firms like Microsoft to access and use Reddit’s data. Having already struck agreements with Google and OpenAI, Huffman is determined to ensure that Reddit’s data is used in a way that benefits the platform and its users.
Need for Control and Compensation
In a conversation with The Verge, Huffman underscored the necessity of having formal agreements to control how Reddit’s data is displayed and used. “Without these agreements, we don’t have any say or knowledge of how our data is displayed and what it’s used for,” he explained. Huffman criticized Microsoft, Anthropic, and Perplexity for not negotiating and said dealing with these companies has been “a real pain.”
Reddit’s stance on data protection has intensified, with the company updating its robots.txt file in early July to block web crawlers from firms lacking agreements. As a result, Reddit content now appears only in Google search results—where Reddit is compensated—while Bing and other search engines no longer show Reddit data.
Allegations Against Microsoft
Huffman has accused Microsoft of using Reddit data without authorization to train its AI and summarize content in Bing search results. He also alleged that Reddit’s data was sold through Bing’s API to other search engines. Huffman pointed to a recent statement by Microsoft AI CEO Mustafa Suleyman, who called public internet data “freeware,” to criticize Microsoft’s approach.
“We’ve had Microsoft, Anthropic, and Perplexity act as though all of the content on the internet is free for them to use,” Huffman said, reflecting his frustration with their stance.
In response to the removal of Reddit results from Bing, Microsoft’s head of search, Jordi Ribas, tweeted that Reddit had blocked Bing from crawling its site, which he suggested was a move to favor another search engine and impact competition. Microsoft spokesperson Caitlin Roulston added that Microsoft respects the preferences of websites that do not want their content used for its AI models.
Licensing as a Solution
Huffman pointed to OpenAI’s recent introduction of SearchGPT—capable of displaying Reddit results thanks to a licensing deal—as a model for future agreements. Reddit’s spokesperson, Tim Rathschmidt, confirmed that none of Reddit’s existing content licensing deals include exclusive use cases for the data.
By advocating for such licensing agreements, Reddit is aligning itself with other media publishers seeking compensation for their content used in AI. Huffman noted, “The traditional value exchange from search engines has changed. Search and summarization and training are merging, and the value exchange of crawling in exchange for traffic back is becoming muddied.”
Industry Reactions
In response to the situation, Anthropic’s spokesperson Jennifer Martinez stated that Reddit has been on their block list for web crawling since mid-May and they have adhered to Reddit’s robots.txt file. “We respect robots.txt, the industry-accepted signal for blocking web crawling,” Martinez said.
Reddit’s crackdown on unauthorized scraping began in June with an updated Robots Exclusion Protocol, which successfully blocked Bing from accessing its data—a fact confirmed by Microsoft’s Jordi Ribas.
Reddit’s recent actions reflect a growing trend among content providers to protect and monetize their data. As AI training and content summarization become more prevalent, the traditional model of exchanging web traffic for data access is shifting. Huffman’s efforts highlight a broader movement toward establishing clear, profitable agreements for data usage.