Artificial intelligence has conquered chess, mastered protein folding, and can write poetry that brings tears to your eyes. But ask Google’s most advanced chatbot to play a simple Pokémon game, and apparently, it falls apart faster than a house of cards.
Google DeepMind recently published a fascinating report about their flagship model, Gemini 2.5 Pro, and its attempts to play Pokémon Blue—the classic 1990s game that countless kids have beaten with their eyes closed. The results were both hilarious and eye-opening, revealing that even our smartest machines can crack under pressure.
The Great Pokémon Experiment
The whole thing started on a Twitch channel called Gemini_Plays_Pokemon, where independent engineer Joel Zhang decided to put Google’s AI through its paces. What was supposed to be a demonstration of advanced artificial intelligence quickly turned into something resembling a digital nervous breakdown.
According to DeepMind’s own researchers, Gemini began showing signs of what they clinically termed “Agent Panic.” Picture this: whenever the AI’s Pokémon team was running low on health or power points, the model would essentially lose its cool. Its internal thoughts would spiral into repetitive loops, frantically obsessing over the need to heal its party or escape whatever dungeon it was trapped in.

The behavior was so pronounced that regular Twitch viewers started recognizing the telltale signs of AI anxiety. “This behaviour has occurred in enough separate instances that the members of the Twitch chat have actively noticed when it is occurring,” the DeepMind team noted in their report.
Numbers That Tell a Story
Here’s where things get really interesting. Gemini’s first complete playthrough of Pokémon Blue took a whopping 813 hours. That’s nearly 34 full days of gameplay for a game that most players can finish in 20-30 hours. Even after Zhang made adjustments to help the AI perform better, the second attempt still clocked in at 406.5 hours.
To put that in perspective, your average 10-year-old could probably beat this game multiple times in the span it took Gemini to finish it once. The AI that can help you write code, solve complex math problems, and have philosophical discussions was being outperformed by elementary school kids.
The Internet Reacts
Social media had a field day with these revelations. One viewer observed, “If you read its thoughts when reasoning it seems to panic just about any time you word something slightly off.” Another user coined the term “LLANXIETY,”a clever play on Large Language Model anxiety.
Perhaps the most thoughtful response came from someone who suggested that Pokémon might serve as an unexpected benchmark for artificial intelligence: “I’m starting to think the ‘Pokémon index’ might be one of our best indicators of AGI. Our best AIs still struggling with a child’s game is one of the best indicators we have of how far we still have yet to go.”
What This Really Means for the Game Pokémon
Although Gemini does not feel emotions the way humans do, its irrational decision under stress is too human. Human beings act on impulse or resort to unhealthy habits of negative thinking when they are under stress, just as the AI did.
This finding is an opportune one. Apple has just published research a few weeks ago that indicated that the majority of AI models of reasoning do not actually reason. Rather, they draw heavily on pattern recognition and fail when the situation gets more complex or demanding.
The Pokémon test is a modest reminder that, for all the fuss over artificial intelligence, we’re not even close to being able to build machines that can handle the surprise turns and twists of real-world realities as gracefully as a child enjoying his or her favorite computer video game.
At times, the most telling tests are not the advanced standards created by experts but the plain old games that have amused children for centuries.