Artificial intelligence firms, including industry leader OpenAI, are encountering unexpected setbacks in their pursuit of ever-larger language models. To address these challenges, they are exploring new training techniques that mimic more human-like thinking, according to insights shared by a dozen AI experts and investors. In this competitive market, OpenAI seeks new path to smarter AI.
The recent release of OpenAI’s o1 model demonstrates a shift from the traditional approach of scaling models by adding more data and computing power. These new methods could reshape the competitive landscape of the AI sector, particularly in terms of the massive resources like energy and specialized chips that companies depend on.
OpenAI’s efforts to push the limits of large language models have sparked a broader conversation among AI scientists about the diminishing returns of scaling models. Ilya Sutskever, co-founder of Safe Superintelligence (SSI) and a former key figure at OpenAI, revealed that results from scaling pre-training using vast amounts of unlabeled data are hitting a plateau.
Sutskever, previously a strong advocate of scaling data to drive generative AI breakthroughs, remarked that the industry is entering a new phase. “The 2010s were defined by scaling, but now we’re in a time of discovery,” Sutskever noted. SSI, which he founded after leaving OpenAI earlier this year, is working on alternative approaches to AI model development, though details remain under wraps.
Mounting Challenges in AI Model Training
Despite rapid advancements, many AI labs are facing delays in launching models that surpass OpenAI’s GPT-4. High costs and logistical challenges are slowing progress. Training runs for large models require tens of millions of dollars and vast hardware resources, often leading to system failures. Researchers may not know if a model will succeed until months into its training, sources say.
Additionally, data scarcity has become a pressing issue. The AI industry has nearly exhausted the vast pools of easily accessible data. Power shortages are also impacting training processes due to the enormous energy consumption required.
The “Test-Time Compute” Technique
To address the afoementioned hurdles, researchers are turning to a new strategy known as “test-time compute.” OpenAI seeks new path to smarter AI by exploring “test-time compute” methods to enhance problem-solving capabilities. This technique enhances AI models during their use, focusing on generating multiple solutions before selecting the best one. It allows AI systems to allocate more processing power to complex tasks, such as coding or solving mathematical problems.
OpenAI’s latest model, o1, utilizes this method to “think” through problems in a step-by-step manner, mimicking human reasoning. According to Noam Brown, an OpenAI researcher, a model that takes extra time to deliberate during a task can achieve performance boosts equivalent to scaling up the model 100,000 times.
Race Among AI Labs to Innovate
Realising the limits of current scaling methods, OpenAI seeks new path to smarter AI. OpenAI is not alone in exploring this new technique. Other major AI labs, including Google DeepMind, Anthropic, and xAI, are developing their own approaches to test-time compute, according to insiders. OpenAI remains committed to staying ahead, with plans to integrate these methods into larger models in the future.
“We see plenty of quick wins that can make our models significantly better,” stated Kevin Weil, OpenAI’s chief product officer. “By the time others catch up, we’ll be several steps ahead.”
Impact on AI Hardware and the Market
This shift in strategy may also alter the AI hardware landscape, which has so far been dominated by Nvidia’s advanced AI chips. Nvidia, which recently surpassed Apple as the world’s most valuable company, could see increased competition in the inference market.
Sonya Huang, a partner at Sequoia Capital, suggested that the shift from training clusters to cloud-based inference systems will redefine resource allocation in AI. Nvidia’s CEO, Jensen Huang, emphasized the growing importance of inference at a recent conference, indicating that demand for the company’s latest Blackwell chips remains high due to these new developments.
Also Read: L&T Tech Acquires Intelliswift, Expands into Retail and Fintech.