Gemini 2.5 Pro, Google’s latest artificial intelligence model released just last month, has achieved a remarkable feat by completing the classic 1996 video game Pokémon Blue. This accomplishment adds substantial weight to Google’s bold claim that Gemini 2.5 Pro stands as “the most intelligent AI model” currently available.
The victory came during a livestream hosted by Joel Z, a 30-year-old software engineer with no official affiliation to Google. The achievement prompted Google CEO Sundar Pichai to celebrate on X (formerly Twitter), exclaiming, “What a finish! Gemini 2.5 Pro just completed Pokémon Blue!”
But why does beating a nearly three-decade-old video game matter in the world of cutting-edge AI development?
Why Pokémon Blue Presents a Unique AI Challenge
Pokémon Blue isn’t just any video game. Released in 1996, it features complex gameplay mechanics that require strategic thinking, long-term planning, and visual navigation—all crucial building blocks for general artificial intelligence.
To succeed in the game, an AI must:
- Navigate an open world with limited information
- Make strategic decisions in combat
- Manage inventory and resources
- Maintain progress toward long-term goals
- Process visual information effectively
These challenges go far beyond simple pattern recognition, demanding capabilities that closely resemble human cognitive functions. By conquering Pokémon Blue, Gemini 2.5 Pro has demonstrated proficiency in these core competencies.
Google’s Claims vs. Reality
During the recent launch of Gemini 2.5 Pro, Google positioned the model as superior to competitors including OpenAI’s o3 models, DeepSeek R1, and Claude from Anthropic. Google’s internal benchmarks appeared to support these claims, but independent verification remained necessary.

The Pokémon Blue victory offers tangible, real-world evidence of Gemini’s capabilities. Google has highlighted significant improvements in the model’s coding abilities, describing them as “a big leap over 2.0” with “more improvements to come.” According to Google, “2.5 Pro excels at creating visually compelling web apps and agentic code applications, along with code transformation and editing.”
This isn’t just marketing talk—on SWE-Bench Verified, an industry benchmark for agentic coding, Gemini 2.5 Pro achieved an impressive 63.8 percent score using a custom agent setup.
The Competition: Claude’s Ongoing Battle with Pokémon Red
Anthropic’s Claude AI has been engaged in a similar challenge, attempting to complete Pokémon Red. Despite leveraging “extended thinking and agent training” that provided “a major boost” for tackling “more unexpected” tasks, Claude has yet to finish the game.
This direct comparison offers an interesting measure of relative capabilities between two leading AI systems. While benchmark scores can sometimes feel abstract, the ability to complete a complex game provides a more intuitive understanding of an AI’s practical capabilities.
Despite the impressive achievement, it’s worth noting that Gemini didn’t complete Pokémon Blue entirely on its own. The developer occasionally intervened to fix bugs or restrict certain actions, such as overusing escape items. However, Joel Z maintains that no direct walkthroughs or step-by-step guidance were provided, with just one exception involving a known glitch.
This human assistance highlights an important reality: while today’s AI models have made remarkable progress, they still benefit from human oversight when tackling complex, open-ended challenges. The question remains whether Gemini could manage the same feat entirely independently.
What This Means for AI Development?
Gemini 2.5 Pro’s victory over Pokémon Blue represents more than just a gaming milestone. It demonstrates how large language models, when properly deployed within structured environments, can tackle complex tasks requiring planning, strategy, and adaptation.
While this achievement doesn’t yet signal true general intelligence, it does indicate significant progress toward AI systems that can manage extended, multi-step challenges with minimal human intervention. The ability to maintain context and work toward long-term goals—even in a gaming environment—suggests applications far beyond entertainment.
As AI models continue to evolve, these capabilities will likely translate to more practical applications in fields ranging from software development to scientific research, where long-term planning and strategic thinking are essential.
For now, Google can rightfully celebrate this milestone as evidence that Gemini 2.5 Pro represents a meaningful step forward in AI capability—even if the journey toward truly general artificial intelligence remains ongoing.