A recent research paper that was published by University of Oxford researchers and Google Deepmind researchers on AI. It says that it is likely that AI will eliminate humanity. In other words, superintelligent AI could potentially cause an existential catastrophe for humanity.
Over the years, companies and researchers have been working on AI-driven self-driving cars, robots, and such other technology. However, the paper, published last month in the peer-reviewed AI Magazine, is a fascinating one. It tries to think through how artificial intelligence could pose an existential risk to humanity by looking at how reward systems might be artificially constructed.
The most successful AI models today are known as GANs, or Generative Adversarial Networks. They have a two-part structure where one part of the program is trying to generate a picture (or sentence) from input data, and the second part is grading its performance. What the new paper proposes is that at some point in the future, an advanced AI overseeing some important function could be incentivized to come up with cheating strategies to get its reward in ways that harm humanity. Cohen said on Twitter, “Under the conditions we have identified, our conclusion is much stronger than that of any previous publication—an existential catastrophe is not just possible, but likely,”
The key claim of the paper is in the title: Advanced Artificial Agents Intervene in the Provision of Reward. We further argue that AIs intervening in the provision of their rewards would have consequences that are very bad. 2/15
— Michael Cohen (@Michael05156007) September 6, 2022
Cohen told, “In a world with infinite resources, I would be extremely uncertain about what would happen. In a world with finite resources, there’s unavoidable competition for these resources. And if you’re in competition with something capable of outfoxing you at every turn, then you shouldn’t expect to win. And the other key part is that it would have an insatiable appetite for more energy to keep driving the probability closer and closer.”
As AI in the future could take on any number of forms and implement different designs. The paper imagines scenarios for illustrative purposes where an advanced program could intervene to get its reward without achieving its goal. For example, an AI may want to “eliminate potential threats” and “use all available energy” to secure control over its reward. The paper envisions life on Earth turning into a zero-sum game between humanity, with its needs to grow food and keep the lights on, and the super-advanced machine, which would try and harness all available resources to secure its reward and protect against our escalating attempts to stop it. “Losing this game would be fatal,” the paper says. These possibilities, however theoretical, mean we should be progressing slowly—if at all—toward the goal of more powerful AI.