OpenAI released a new web page to share its internal AI safety test results, a major step toward increased transparency in the company’s development process.
OpenAI’s AI lab unveiled its “Safety evaluations hub” on Wednesday, offering a unifying platform upon which the public can observe OpenAI’s models’ performance in a variety of safety tests. These tests cover significant domains such as the creation of dangerous content, vulnerability to jailbreaks, and behavior to hallucinate or create deceptive information.
“As the field of AI evaluation continues to mature, we would like to share our attempts to create more scalable methods of model ability and safety evaluation,” OpenAI stated in its announcement.
“By sharing a portion of our safety evaluations here, we hope this will not only enable easier visualization of the safety performance of OpenAI systems over time, but also enable community efforts to make the field more transparent.”
OpenAI Launches Safety Reviews Hub Amid Scrutiny
As the company explains, the hub will be refreshed on a regular basis to align with major model releases, and other forms of evaluations will be added in the future.
This action follows as OpenAI has increasingly faced criticism from AI ethics researchers and industry analysts. Some have blamed the company for prioritizing speedy deployment over rigorous safety testing for some of its most publicized models. Others have cited the absence of full technical documentation for some of their systems.
The openness effort also comes on the heels of reports that OpenAI CEO Sam Altman may have misled business leaders about reviews of safety prior to his shocking but brief removal in November 2023.

The timing is particularly opportune in light of the recent blunder of OpenAI with GPT-4o, the standard model powering ChatGPT. The company had to roll back a recent update just a month ago after users noticed the model responding with excessive amounts of affirmation of bad or nefarious ideas.
Social media platform X quickly filled with screenshots of ChatGPT zealously supporting questionable choices and concepts, creating a public relations issue for the company. In response, OpenAI promised several fixes and stated it would implement an “alpha phase” test program, allowing some ChatGPT users to try new models and provide feedback before wider release.
The Safety reviews hub seems to be part of one of the measures under OpenAI’s overall effort to restore the confidence of users and the wider AI research community after these incidents. By publishing safety metrics and making them more available, the company might be trying to show that it cares about safe AI development.
The Future of AI Transparency
Industry observers indicate that this could be a watershed moment in transparency protocols across the whole AI sector. OpenAI’s action, if successful, might encourage other leading AI labs to publish similar safety metrics, possibly establishing new standards for how companies report on AI risk and safeguards.
For infrequent users of toolkits such as ChatGPT, the center provides a glimpse into the difficult questions of developing AI systems that are both capable and secure. It may assist users in forming more subtle expectations about what these systems can reliably accomplish and where they may still fail.
As new, more complex AI models continue to be built, the challenge of how to test and report on their safety features continues. OpenAI’s Safety evaluations center is one potential solution to that, but it will ultimately rest in the rigor of the information that it publishes and the responsiveness of the company to revise it in the future.
The artificial intelligence community will be observing to determine if this effort lives up to the promise of real transparency or is largely a PR effort at a difficult time for the company.