Amazon.com Inc.’s efforts to build and refine its artificial intelligence systems have revealed a troubling side effect: a dramatic increase in suspected child sexual abuse material uncovered within data used for AI training. According to child protection organizations, the volume of material flagged by Amazon last year far exceeded that of its peers, raising serious concerns about how AI training data is sourced and monitored across the tech industry.
While Amazon says it removed the material before it could be used to train its models, child safety experts argue that the company’s limited ability to identify where the content originated has reduced the value of those reports for law enforcement and victim protection.
Massive Amounts of AI Training Data Under Scrutiny
As part of its AI development process, Amazon scanned large volumes of data collected to improve its machine learning systems. During that review, the company detected hundreds of thousands of items it believed matched known child sexual abuse material. Each instance was reported to the National Center for Missing and Exploited Children (NCMEC), the U.S.-based clearinghouse responsible for handling such tips and forwarding actionable cases to law enforcement agencies.
NCMEC recently began separating reports connected specifically to artificial intelligence development from its broader data set. That effort revealed a striking pattern. In 2025, AI-related reports surged at least fifteen times compared with the previous year, with Amazon accounting for the vast majority of submissions. The scale of the increase had not been publicly disclosed before.
Gaps in Reporting Limit Investigative Action
Although Amazon says it acted out of caution by reporting potential violations, officials at NCMEC say many of the submissions lacked crucial information. Reports typically did not include details about where the content was found, who originally uploaded it, or whether it remained accessible online. Without that context, the organization says it is unable to trace the material back to its source or help authorities intervene.
Amazon has stated that the data came from third-party sources and that it does not have visibility into the original locations of the material. The company noted that using data drawn from publicly available web sources is a common practice in AI development. However, NCMEC officials say Amazon stands apart from other companies in both the volume of reports and the lack of actionable detail.
Other technology firms have also scanned their AI training data and reported suspicious material. According to NCMEC, those companies collectively submitted only a small number of reports and generally included enough information for follow-up investigations.
AI Development Accelerates, Safety Struggles to Keep Pace
The spike in reports comes amid an intense push by technology companies to rapidly advance AI capabilities. The race to release more powerful models has driven firms to gather enormous datasets consisting of text, images, and video from a wide range of sources.
Child safety advocates and regulators warn that this rapid expansion has created blind spots. Safeguards designed for traditional consumer platforms have not always translated well to AI development pipelines, making it harder to detect and address abuse at scale.
In total, NCMEC received more than one million AI-related reports in 2025, a sharp increase from about 67,000 in 2024 and fewer than 5,000 in 2023. These reports can involve AI-generated content, explicit interactions with chatbots, or real images of abuse that were unintentionally collected during data gathering.
Risks of Reusing Exploitative Material in AI Systems
Experts warn that including illegal or exploitative material in AI training data poses unique dangers. Even if such content is removed before models are deployed, exposure during training could influence how systems process and generate images or text. There are concerns that models trained on harmful data could become more effective at producing sexualized content involving children or manipulating real images.
There is also the risk of perpetuating harm to victims if images used in training continue to circulate within datasets. Reuse of such material can prolong trauma for those affected.
Amazon has said it is not aware of any instances in which its AI models have generated child sexual abuse material. The company also stated that the flagged content was identified through automated tools that compare files against databases of confirmed abuse images involving real victims. Nearly all of the detections came from non-proprietary data, according to the company.
Amazon acknowledged that its scanning process uses an intentionally broad threshold, which can produce a high number of false positives. The company said this approach was chosen to minimize the risk of overlooking harmful content.
Sharp Increase Raises Red Flags for Experts
The sheer volume of Amazon’s reports surprised child safety specialists. In 2024, Amazon and its subsidiaries submitted roughly 64,000 CSAM reports across all operations. The jump to hundreds of thousands tied specifically to AI workflows marked a significant departure from prior years.
NCMEC officials said the scale of the findings raised questions about how training data is gathered and what safeguards are in place before it enters AI systems. They also expressed concern that the lack of detail in reports limits their usefulness for identifying offenders or locating children who may still be at risk.
While companies are not legally required to provide extensive background information, NCMEC said the absence of actionable data prevents further steps from being taken once a report is received.
Growing Calls for Transparency in AI Development
Amazon has defended its reporting approach, stating that the way its data is sourced prevents it from offering more detailed information. The company says it remains committed to responsible AI development and child safety across its platforms.
However, researchers and former technology officials argue that greater transparency is essential. They say companies must be clearer about how training data is collected, filtered, and evaluated, particularly as AI systems become more advanced and influential.
AI datasets can include licensed material, purchased collections, web-scraped content, or synthetic data generated by other models. Critics warn that prioritizing speed over thorough safety analysis increases the likelihood that harmful material will be absorbed into training pipelines.




