Artificial Intelligence and Hold’em, Part 3: No-Limit Hold’em, The Next Frontier
Poker pro and software developer Nikolai Yakovenko concludes his three-part series examining how far researchers have gotten in their efforts to build a hold’em playing AI system.
In the first two parts of our consideration of the role of counter-factual regret minimization (or CFR) in the advancement of poker-related artificial intelligence, we explained how CFR works and how its implementation has helped researchers come close to “solving” heads-up limit hold’em.
To conclude the discussion, let’s delve a little more deeply into recent efforts to discover solutions for no-limit hold’em and talk about CFR’s important role in that endeavor as well.
NLHE: Five Minutes to Learn, How Long to Create an AI?
It goes without saying that while limit hold’em provides plenty of challenges to researchers, no-limit hold’em is a whole different ball game.
It’s still hold’em, so the boards are the same, and it’s still possible to visit every canonical board, just as many multiple times. However, these similarities ignore the betting. And as anyone who started out with limit hold’em then moved over to no-limit well knows, the betting is hardly something that you can ignore in NLHE.
Of course, we never really ignored the betting in the limit hold’em implementation of CFR. It’s just that for each state, let’s say on the flop, there were two or three possible betting actions, and no more than 20 possible pot sizes, into which we could enter the specific game situation. Furthermore, if we use enough buckets for previous states, the buckets are grouped in a logical way, and we have decent approximate solutions for those other buckets, it is possible to solve each limit hold’em state, largely ignoring how we got here and just looking at the cards and a small number of possible betting contexts.
Once we try applying CFR to no-limit hold’em, it’s less clear that this approach will work. Can we really ignore the order of previous bets and just look at the cards we know and the pot size? And what happens when both we and our opponent can make many differently sized bets, not just the two or three actions of limit poker?
We started with a game that was too big to solve, but figured out that at least you can visit every possible state with a few tricks. You can solve the simplified game, where some of the simplifications are exact and others are very close to exact. Now, if you’re playing NLHE with stacks 100 big blinds deep, you can’t even consider every possible opponent response for a single one of your actions.
In an interview during the man-vs.-machine no-limit hold’em match from earlier this year, Doug Polk talked about the Claudico AI taking 20 seconds to move on the river, even in small-pot, no-action situations. It was not so simple for Claudico to look up the position’s bucket, and to apply an instant strategy.
Even so, counter-factual regret minimization is a great way to start building a no-limit hold’em AI. Suppose you are playing heads-up no-limit, but only 10 big blinds deep. Can the CFR solve for that? Sure it can. What if you are playing a bit deeper, but limit your bets to a min-raise, 2x a min-raise, half-pot, and all-in? That’s still a much more complicated game than limit hold’em. But while the variance will be higher and you’ll be folding a lot more often, you will get a solution.
Would this CFR play within 1% of a perfect no-limit player (as it can do with LHE)? Not even close. It’s easy to see cases where, if you’re not careful, an opponent could just overbet on any weak board and pick up the pot unless the AI learns to call off sometimes with no hand. That’s tricky to do. That said, the CFR will quickly produce a player that plays every board with at least some semblance of balanced logic.
It’s not hard to imagine such a heads-up no limit player being hard to beat even if it only sticks with a few specific bet sizes, as long as it can handle all different bets made by its opponent. Even if it does fold too much when you overbet on weak boards, sometimes the AI will have a strong hand, which limits your ability to take away every pot, to a point. You’re also less likely to beat the no-limit CFR with small ball, as this is the type of game that you’d expect a balanced Nash equilibrium player to be good at.
Heads-up no-limit hold’em is like a passing down in the NFL. There are too many possible plays for the defense to be able to solve for every possible route that can be taken by the eligible receivers. The CFR approach to this situation would be like playing a zone defense. A pass catcher will always be open, but it won’t be easy for the offense to locate the holes. You can try, but you won’t be able to find the same throw every time, even if the zone defense is not actively adjusting to your play and is just mixing up looks with an balanced approximate Nash equilibrium strategy.
There will be systematic weaknesses, then, but it doesn’t mean you could exploit them on every play. At the very least, you’d have a tough opponent, even if that opponent doesn’t look like a real football team, with its bucketed game situations and Nash equilibrium blitzes.
Then again, if you removed the forward pass, and simplified the game to 5-on-5 football with a single set of downs on a narrow field, it might be possible to solve the game outright.
Imagining a Strong NLHE AI: The CFR Hybrid
We’ve spent a lot of space considering how Texas hold’em can be solved by a equilibrium-finding algorithm. More specifically, we looked at simplifying the game to something close to Texas hold’em but with a magnitude fewer game states, and then applying counter-factual regret minimization to the smaller problem. This yields a near-equilibrium to the faux-hold’em problem, and in practice, often a very good solution for the real Texas hold’em game.
The better we model the game in the simplified problem, the closer we get to an unbeatable strategy for real poker.
However, this isn’t the only way to come up with a strong hold’em AI. Rather than taking another 5,000 words to examine the weaknesses of CFR and the strengths of other methods, let’s think about what we might want to see from a strong hold’em AI, once we have that bucketed situation, zone-defense approach described above.
You’d want the AI to play in three-handed and six-handed games, but before we get to that, I think you’d also want it to be able to adjust to opponents. I don’t mean to adjust on the fly, but at least to see what it can learn, say, over the course of playing 10,000 hands against a particular opponent. CFR has no methodology for doing that. By considering every possible response, albeit in a simplified way, there’s no scope for adjusting to the moves that are actually being made against it, and thereby giving those moves more weight going forward than game states that never develop.
Perhaps an even bigger problem is that the no-limit hold’em CFR is pre-trained to play every hand 200 big blinds deep. It could also be trained to play 100 BB deep, or just 10 BB deep, since CFR is a general algorithm, but each stack size would involve a separate week-long training process. Of course, if you have enough computers, you could run a dozen such processes in parallel, then apply the closest one, given the effective stacks. In practice, this should be good enough to play a wide range of stacks, and not really possible for a human to exploit by buying in short.
As with the case of Claudico thinking for 20 seconds on the river, likely because it was running a simulation when it could not simply look up a strategy, the future of strong no-limit hold’em bots appears to be some sort of CFR hybrid. The unexploitable solution to an approximation of hold’em serves as a good baseline. With online search or other methods, it should be possible to fix many of CFR’s weaknesses, one by one, by tweaking that baseline.
One thing it’s hard for CFR to do is to play like a human. A simple tweak to CFR can’t get away from the fact that it’s based on an equilibrium strategy, which plays each hand in a vacuum and restricts itself to a fixed number of bet sizes. Otherwise the problem is too big for equilibrium solutions.
Trying Neural Networks
Another avenue to consider is how applying a neural network on top of CFR might help create what could be regarded as a more adaptable, more human-like player.
It wouldn’t actually have to be a neural network — it could be any machine-learning algorithm that learns to map a game state to a betting strategy. But a neural network is often used in such a context, so let’s ignore other function-learning algorithms and assume we’ll use a neural network to learn our betting function.
We need our betting function to give us one of two things: either a chip-value for each possible bet or a recommended betting policy. What we will ultimately need to use is a betting policy, but as I discussed in “Teaching an Artificial Intelligence System to Play 2-7 Triple Draw,” if you have a value estimate for each action, that also gives you a betting policy.
A machine-learning algorithm needs training data, and in this case, we can get as much data as we want by playing against the CFR’s pretty good (and very fast) algorithm. Better yet, CFR can play itself, and we can train a neural network on the hand histories. The resulting neural network will learn to play much like the CFR.
Why not just use the CFR? The nice thing about a neural network that imitates the CFR strategy is that now you have something that can adapt to human play. For example, you can train the neural network for a week until it plays very close to the CFR. Then you can swap out that training data and keep training — say, just for an hour — on a sample of human hands, or even a single opponent’s hand histories. There’s an NFL comparison here, too. You’re taking a player with years of football experience, and adding a walk-through against this week’s opponent’s offense and defense.
They don’t explain how it’s done, but I assume this is how the ”No-Limit Texas Hold’em” slot machine in Las Vegas that many played against at this year’s WSOP creates a ”Phil Hellmuth” mode and a ”Johnny Chan” mode that you can play against for real money. I know that they use a neural network for their player, and I’d bet they took the original amorphous neural net and trained it against Hellmuth and Chan hands to create slightly different versions of the network. The long-run ability is about the same, but the ”personality” of each network appears different.
You might ask — do you really need to train a neural network to copy the CFR player, just to modify it? In a sense, you don’t. CFR is a specific method for solving for an equilibrium, and it’s very good at it, so you could just use the neural network to adjust those CFR outputs rather than needing the neural network to produce both the baseline and the final answers.
Suppose you have a strong CFR player, but you absolutely need to avoid it betting in 2x min-bet, half-pot, pot, and all-in bet sizes. You could just get an answer from CFR and add random noise to the bet, so that it’s effectively playing CFR but splashing the pot a little bit. Instead of the random noise, a neural network could learn better noise outputs in various cases. In the simplest version, you could play (CFR + noise) against (CFR + noise), and the neural network could use that data to learn what noise sizes worked in different cases over a moderate sample.
There’s even a name for this — it’s called an ”actor-critic” model. The ”actor” network learns all of the action values, and recommends an action policy. Meanwhile the “critic” suggests tweaks to this policy.
Separating these two functionalities is especially useful when learning control over a continuous action space, where it might be possible to count all of the possible actions, but a bit silly to treat them as disconnected buttons. Scientists at Google have recently demonstrated an actor-critic neural network that learns to play a car-racing game just by observing the screen pixels and pressing random buttons. In this case the continuous inputs are left/right and go/brake.
On the down side, neural networks are slow and not as accurate at solving for an equilibrium as CFR. However with a neural network we can be more flexible with the inputs for training, and with how we respond — we haven’t even looked at how a recurrent neural network can remember information from previous hands against an opponent. But it will not be possible to traverse every game board with a neural network, and compute a balanced strategy as we do with CFR.
Perhaps a third method will emerge, but it looks like some combination of an equilibrium-solving strategy like CFR, and on top of it a neural network-based critic, might create unbeatable poker players that can also play a bit like humans, minus the trash talk.
Some players will be concerned about these advances in AI and what they might mean to the future of poker. There is no reason to be. Putting all of these pieces together takes a lot of work, and as an AI problem, poker is not very lucrative. Most of the cutting-edge poker AI work is done by academics (and by amateurs) for science and for the love of the game. Meanwhile beating the stock market with AI might be worth billions, and efficiently routing Ubers might be worth a lot of money as well.
Heads-up poker is not that kind of problem, although it’s a unique crucible in which to test the strength and adaptability of artificial intelligence, especially as poker AIs learn how to play full ring games, adapt to multiplayer dynamics, and deal with variable stack sizes. Perhaps later, they will also tackle games like Omaha, which consists of something like 100 times more game states than Texas hold’em.
I’m especially intrigued by the idea of an AI system that produces a baseline — be that CFR or a neural network that holds its own against a CFR — then uses another network as a critic to adjust that baseline for specific cases, against specific opponents, or to reflect something local over recent hands (be that tilt, the mood of the competitors at your table, or something else). There’s something nice about seeing a baseline, balanced over all possible hands, and then seeing how we might want to deviate from this ”standard” play.
I think that best summarizes how we humans think about poker decisions, or at least how we talk about them with other players. Everyone’s a critic.
Nikolai Yakovenko is a professional poker player and software developer residing in Brooklyn, New York who helped create the ABC Open-Face Chinese Poker iPhone App.