GTO Wizard AI Outperforms GPT-5 and Grok 4 in New Benchmark

Will Shillibier
Managing Editor
4 min read
GTO Wizard

In the rapidly evolving world of artificial intelligence, a common question in the poker industry has emerged: when will AI be good enough to consistently beat its human counterparts?

Humans were first pitted against AI back in 2019, with the first AI to beat Humans"]Pluribus besting a team of human players, becoming the first AI model to do so. Then, just last year nine AI models battled it out over almost 4,000 hands to find out who was best. While Meta's LLAMA 4 went broke, OpenAI o3 emerged victorious.

However, the frontier of poker and artificial intelligence has a new top model: GTO Wizard AI.

What is GTO Wizard AI?

Their new GTO Wizard AI model is a state-of-the-art poker agent that powers all the site's custom solutions. Rather than being built off a general-purpose model, GTO Wizard AI was originally developed as Ruse AI by Canadian programmers Marc-Antoine Provost and Philippe Beardsell. This technology was acquired by GTO Wizard in 2023.

Unlike earlier bots like Slumbot (the 2018 Annual Computer Poker Competition (ACPC) champion), which relied on massive, pre-computed strategies, the GTO Wizard AI model does not store a complete poker strategy before play. Rather, it was trained against itself of hundreds of millions of hands, gradually learning which plays led to the highest expected value.

"Through deep reinforcement learning," says GTO Wizard, "GTO Wizard AI considers each particular situation as it arises during play and solves it in real-time, in a matter of seconds."

This approach was vindicated after GTO Wizard AI took on Slumbot in a controlled 150,000-hand match; GTO Wizard AI recorded a win rate of 19.4 bb/100 against Slumbot.

The outcome was as dramatic as it was surprising: GTO Wizard AI achieved a win-rate of 19.4BB/100 over the course of the match. For context, a world-class human professional typically aims for a win rate of 5 bb/100. If the stakes were $50/$100, with 200 hands of heads-up played per hour, GTO Wizard AI would have won $19.4 per hand at an hourly win rate of $3,880.

GTO Wizard AI

New AI Poker Benchmark

But this isn't the only model that GTO Wizard AI has taken on and beaten.

New benchmark results provide the first standardized comparison between "frontier" Large Language Models (LLMs) and specialized poker agents. The data reveals that, while general AI has made massive leaps in reasoning, it still lacks the specific strategic depth required to beat the world’s leading poker solver.

GTO Wizard AI Benchmark Leaderboard

RankModelOrganizationLuck-Adjusted Win Rate (bb/100)Standard DeviationHands
1GPT-5.3 (XHigh Reasoning)OpenAI-1635,000
2MarvelMIT-144.75,090
3GPT-5.4 (XHigh Reasoning)OpenAI-17.83.75,000
4GPT-5.3 (High Reasoning)OpenAI-18.23.95,000
5Claude Opus 4.6Anthropic-20.44.45,000

Note: Correct as of April 10, 2026

OpenAI: GPT-5.3 is the current leader among general models, but still trails the specialized poker agent by -16.0 bb/100. Claude Opus 4.6 (-20.4 bb/100) and Gemini 3.1 Pro (-30.8 bb/100) show that even high-level general reasoning struggle No-Limit Hold'em, while Elon Musk's xAI model Grok 4 currently sits significantly lower on the leaderboard with a luck-adjusted win rate of -60 bb/100.

Solving the "Luck" Factor with AIVAT

How does GTO Wizard know these rankings are accurate and not just a run of hot cards? The benchmark utilizes AIVAT, a sophisticated variance-reduction technology. Because poker is naturally high-variance, it usually takes hundreds of thousands of hands to reach a statistically significant conclusion. AIVAT reduces this requirement by 10x, enabling researchers to assess an agent's "luck-adjusted" performance much more efficiently.

Challenge the Wizard: API Access Now Live

GTO Wizard is now providing API access to allow independent developers and researchers to submit their own models for evaluation. This move aims to foster more transparent competition in the AI space. Developers can integrate their agents directly into the evaluation platform to compete in real-time. The API allows for hand simulation and result retrieval without exposing the solver’s internal capabilities.

In order to take on GTO Wizard AI, they must play a minimum of 2,500 hands of Heads-Up No-Limit Hold'em, with 200bb stacks that reset every hand. The API will limit usage to 100,000 hands per month.

As the industry moves toward Heads-Up Pot-Limit Omaha (PLO) benchmarks in the near future, the message from GTO Wizard is clear: the era of "claiming" to be the best is over. Now, you have to prove it on the leaderboard.

Share this article
Will Shillibier
Managing Editor

Based in the United Kingdom, Will started working for PokerNews as a freelance live reporter in 2015 and joined the full-time staff in 2019. He now works as Managing Editor. He graduated from the University of Kent in 2017 with a B.A. in German. He also holds an NCTJ Diploma in Sports Journalism.

More Stories

Other Stories