Geneva, Switzerland – OpenAI, a major player in the AI industry with billions of dollars in funding, faces a significant hurdle: understanding how its AI technology truly operates. Despite its advancements, OpenAI doesn’t actually understand how its AI works. During the International Telecommunication Union AI for Good Global Summit last week, OpenAI CEO Sam Altman admitted the company’s struggle with AI interpretability. When asked by Observer about the inner workings of OpenAI’s large language models (LLMs), Altman responded, “We certainly have not solved interpretability,” highlighting the ongoing challenge in tracing AI outputs back to their origins.

The Atlantic CEO Nicholas Thompson further pressed Altman on whether the lack of understanding should prevent the release of more powerful models. Altman’s response was less than reassuring, stating that their AI systems are “generally considered safe and robust.”

This exchange underscores a critical issue in AI development: researchers find it challenging to explain the unpredictable “thinking” behind AI responses. While AI systems can provide answers effortlessly, tracking the exact data and decisions leading to those responses remains difficult.

Limited Transparency

Despite its name, OpenAI has been notably secretive about the data used to train its models. A UK government-commissioned report from 75 experts recently stated that AI developers “understand little about how their systems operate,” and that current scientific knowledge is “very limited.”

Other AI companies are also attempting to demystify their models. For example, OpenAI’s competitor, Anthropic, has started mapping the artificial neurons in its LLMs, beginning with a model named Claude Sonnet. In a blog post, Anthropic emphasized its commitment to interpretability research, stating, “Understanding models deeply will help us make them safer.”

However, Anthropic acknowledges that this is just the beginning. They note that their current techniques can only identify a small subset of the concepts learned by the model and that understanding the full scope would be too costly. They also need to link these features to the circuits they involve and prove that these safety-relevant features can enhance model safety.

The Stakes of AI Safety

Recently, Altman dissolved OpenAI’s “Superalignment” team, which focused on controlling superintelligent AI, and replaced it with a “safety and security committee” that he now leads. This move, alongside his recent comments, suggests that the company is still far from managing superintelligent AI effectively.

Challenges in Understanding AI

The difficulty lies in tracing AI decisions back to the data they were trained on. AI systems can provide responses that seem almost magical, but understanding the exact data and logic behind those responses is often impossible. This is problematic because it makes it hard to ensure that AI systems are making decisions for the right reasons, which is crucial for safety and reliability.

The difficulty lies in tracing AI decisions back to the data they were trained on. AI systems can provide responses that seem almost magical, but understanding the exact data and logic behind those responses is often impossible. This is problematic because it makes it hard to ensure that AI systems are making decisions for the right reasons, which is crucial for safety and reliability.

The issue of AI interpretability isn’t just a technical challenge; it has profound implications for AI safety and the responsible development of more advanced models. During the summit, Altman was questioned about whether this lack of understanding should delay the release of more powerful models. His response, suggesting that their systems are “generally considered safe and robust,” was not entirely convincing. It highlights a fundamental tension between advancing.

