Teaching an Artificial Intelligence System to Play 2-7 Triple Draw
Over the past couple of years, a phenomenon called “deep learning” has emerged as a promising approach for solving a variety of interesting computer intelligence problems. It may even be a step on the path toward generalized intelligence. It involves creating a single system that can solve a multitude of problems, all with the same “brain,” or at least by running the same computer program with limited changes.
You may already be interacting with deep learning, since these approaches are being used by Google and Facebook to tag your photos. They’ve also produced some cool demos, such as DeepMind (property of Google) teaching an artificial intelligence (AI) system to play Atari games from the 1980s. The computer hones its skill at the games by interacting directly with the screen in a simulated console. First it learns how to play the games, next it’s playing them well, and eventually it learns to play some of the games with nearly perfect precision.
A few months ago, I decided to apply some of these techniques to poker. The idea of general intelligence is interesting to a poker player, since different card games require different knowledge to play well. Even the difference between short-handed and full-ring hold’em presents a number of important distinctions.
But these are also related problems. Some players are better at some forms of poker than others, but we’ve read several stories over the years about tournament players winning bracelets in variants they’ve barely ever played before.
Bertrand “ElkY” Grospellier’s $10K Stud win at the 2011 World Series of Poekr in his first ever live stud tournament, Peter Jetten’s open-face Chinese win at the 2013 PokerStars Caribbean Adventure after only having played a little small stakes OFC before, and Christian Pham “accidental” entry and eventual win in this summer’s $1,500 2-7 NL Draw event at the WSOP spring to mind.
The fact is, poker skills transfer between games. After all, we’re not talking about driving a car on the one hand, and playing piano on the other.
If a single AI can learn Breakout one day, and master Space Invaders the next, perhaps I could have some success putting together a “mixed games bot”? A few months and a lot of head-banging later, my mixed games AI is now playing... heads-up 2-7 triple draw.
Teaching AI 2-7 Triple Draw
Why triple draw? In short, “deuce" triple draw is an awesome game. It’s one of the most popular games online and live at the WSOP, after hold’em and the Omaha variants. Usually spread as part of a mix, limit triple draw is a fast, action game that’s enjoyable for LAGs and nits alike.
The game is simple. You have five cards, all face down, and no community cards. The goal is to make the lowest hand, without any pairs, straights or flushes. Aces are always high. The best possible hand is a “wheel” seven-low — 7-5-4-3-2.
It’s also an ideal example of a “European style”-board game, in that whoever is the furthest behind (drawing the most cards), has the best chance to improve. Put a different way, this means that except for when a player makes a pat seven or eight-low (which is rare), nobody is very far behind. If you’re drawing to a “smooth” hand, you’re rarely drawing dead, and you likely have the odds to call.
At the same time, since a made hand is a favorite against just about any draw, it’s useful to sometimes “snow” a bad hand, by standing pat and betting out, without expecting to have a hand with showdown value. Much of the time, your opponent will miss his draw and fold. But snow too often, or in the wrong spots, and you are simply giving up your chance to improve.
Translating this to heads-up, you can and should play most of your hands for one bet, and often down to the final draw. You need to look for value with a good hand or a good draw, while avoiding putting in too many bets when behind. At the same time, you should not fold marginal hands often enough to get run over by a player who’s capable of snowing when your hand looks weak.
Heads-up limit poker is a natural place to start training a poker AI. The computer can play against itself, then try to improve its play from the results of those hands. In computer science, we call this “bootstrapping,” a reference to Baron Münchausen, a mythical German who got himself out of a bad situation by pulling on his own bootstraps, and thus didn’t need anyone else to help him.
Done right, the computer can learn to play the game at a high level by iteratively playing against itself, seeing what it can learn from those hands, and playing again. If the scientists at DeepMind were able to use this approach to teach their computer to master 50 Atari games at once, surely a system can learn in “deuce” how many cards to draw, when to check-raise, and occasionally when to snow. After all, limit poker is just a series of choices, each one involving pressing one of two or three buttons. Mechanically, it the simplest of video games.
AI in Action: Dealing a Hand of “Deuce”
Let’s take a look at a triple draw hand that two versions of the deep learning poker AI played against each other. This hand is between Poker-CNN-7 on the button and Poker-CNN-76 out of position in the big blind. By the way, CNN stands for “convolutional neural network” (no newscasters were harmed in the making of this program).
Starting with the SB on the button, player Poker-CNN-7 looks down at
['', 3, True, 150.0, '', 0, 0, '']
He’s got a three-card draw to an eight-low or a two-card draw to a 9-8-low. He also has the button, which is nice. The AI decides what to do by estimating the value of each possible action. Here CNN-7 can call, raise, or fold. There is $150 in the pot, since the AI is always playing $100/$200 limits with $50/$100 blinds, and the pot is dead money. That way, the value of folding is always $0. The worth of any hand is an estimate by how much better it would be than laying it down.
- bet: N/A
- raise: $60
- check: N/A
- call: $67
- fold: $0.0
The AI predicts that in the long run, the value of either checking or raising this hand is about $60. Given that Poker-CNN-7 is on the button, this isn’t a great hand. But more importantly, it would be a significant mistake to fold instead of completing the SB for $50. The current value includes the downside of losing that $50, as well as the value of future betting.
The AI also outputs a suggested percentage with which to take each possible action. We’ll get into why you need both a value estimate and an action percentages in order to play strong poker in an upcoming article about GTO (game theory optimal) play, but for now let’s just say that some spots are clear checks or clear bets, though if it’s close, you should avoid taking the same action every time. The AI suggests:
- raise: 4%
- call: 93%
- fold: 3%
In this case, the 4% hits, and Poker-CNN-7 raises. The BB then considers his options with the following hand:
['', 3, False, 300.0, '1', 0, 0, '1']
- raise: -$118 (0%)
- call: -$32 (74%)
- fold: $0.0 (26%)
The CNN-76 player in the BB has a three-card draw to a hand. Not bad. He’ll be out of position on every street, he doesn’t have a deuce, a seven, or an eight, but there is also now $300 of dead money in the pot. This is a marginal hand against a decent player who isn’t auto-raising the button.
Actually, the AI predicts that it’s slightly better to fold than to play, but folding here every time will make auto-raising too profitable, so the AI recommends calling 3/4 of the time, and folding the remaining 1/4. CNN-76 calls, and gets ready for the first draw.
His choice is obvious. He keeps the and takes three cards. Meanwhile, his opponent’s decision is a bit more interesting.
['', 3, True, 400.0, '10', 0, 2, '10']
- ‘pat': $311
- ‘draw_2': $222
- ‘draw_3': $201
The AI isn’t wrong in that this is a decent spot to keep all five cards and “snow” the hand. His opponent called a raise before the draw and then took three cards, so it would be relatively profitable to stand pat, and pretend to have a hand. The CNN-76 player will often not improve and fold, either right away, or after one more draw.
At least that would be true, if CNN-7 and CNN-76 played one hand, and one hand only. Snowing may be profitable once, or once in a while, but an opponent would catch on quickly, and start calling down with weak draws. Meanwhile by snowing, the button here would be consistently declining to improve a decent hand.
Therefore until the AI learns the right snowing ratio from even more self-play, I only allow it to snow on the first draw a third of the time, and this time is not part of that third. But if the AI player wanted to snow, this would not a bad spot to do so.
It’s also interesting that the AI thinks drawing two cards to is a little bit worse than drawing three cards to . Given the action so far, both of these draws are worth about half of the $400 pot.
I showed this hand to poker pro Yuval Bronshtein (who plays a lot of mixed games), and he said it isn’t much better to draw three cards to the . I’m not sure I agree. A 9-8-low is an average winning had at showdown. So drawing to it isn’t great, but according to another pro whom I asked — two-time WSOP bracelet winner Rep Porter (who final-tabled the $10K 2-7 Triple Draw Championship this summer) — drawing fewer cards than your opponent gives you an additional advantage. Players will check to the opponent who draws fewer cards, and will often fold if they miss. Remember, to the opponent drawing three cards, all two-card draws look the same, be they , or a smooth wheel draw.
In any case, this time around the AI keeps and draws two cards.
Out of position, Poker-CNN-76 pulls three good cards, and here is what he’s thinking:
['', 2, False, 400.0, '', 2, 3, '10']
- bet: $337 (73.4%)
- check: $313 (26.6%)
Bronshtein thinks that this is a clear check-raise. I might side with the AI here. The button will usually bet if checked to, but not always. As we just saw, the button’s hand isn’t always strong when he draws two cards. The pat 9-7-low loses a lot of value when the button checks behind.
In any case, CNN-76 values his hand at about $330 in a $400 pot, including the value of future bets. A pat wheel would be worth $600 or more here. CNN-76 flips a coin, and bets.
['', 2, True, 500.0, '1', 3, 2, '101']
- raise: $281 (4.8%)
- call: $289 (95.2%)
- fold: $0 (0%)
Poker-CNN-7 drew two cards and improved. He thinks the hand is worth about $280 in an $500 pot. Of course, it’s worth a lot less if you know that his opponent has a pat 9-7-low. But these estimates are based on the information that CNN-7 is given, which is that his opponent drew three, saw him draw two, then bet out. That pattern could match a number of hands, and not all all them are pat. Calling 95% of the time and seeing what CNN-76 does next sounds about right. Instead, the 5% coin comes up, and CNN-7 throws in a raise.
['', 2, False, 700.0, '11', 2, 3, '1011']
- raise: $559 (24.4%)
- call: $462 (75.6%)
- fold: $0
The pot continues to grow, and so does CNN-76’s estimate of his hand. I don’t know why the AI recommends just calling here more often than raising. Perhaps it needs more practice with three-bets. But the 24.4% choice comes in, and he does three-bet. The button calls.
['', 2, False, 1000.0, '1110', 2, 3, '101110']
- ‘pat': $720
- ‘draw_1': $486
Of course CNN-76 stands pat with his 9-7-low. It’s curious to see the AI’s internal value estimate, $720 in a $1000 pot. This is how computers and humans think about poker differently. A triple draw player can tell you that a pat 9-7-low is a good hand given that action, and that drawing one card to a seven-low, without a deuce, is much worse than standing pat. But can that player give you a dollar value for this mistake? The estimates that the Poker-CNN spits are not perfect. It’s not even clear what kind of opponent ranges these values are supposed to be estimated against. But I think it’s nice to see a dollar value for each action.
Now CNN-7 is on the button and facing an important decision: draw two cards to an eight-low, or draw one for a 9-8-low, which happens to be drawing dead.
['', 2, True, 1000.0, '1110', 3, 5, '101110']
- ‘draw_2': $389
- ‘draw_1': $384
Bronshtein says it’s really bad not to draw two cards here, and he’s right. There are 16 outs to a 9-8-low draw for a 59.0% chance to make it with two draws to come, but how often is the 9-8-low good? Of course we know it’s no good in the actual hand, but even if the opponent is patting a 10-low here, and if he snows sometimes, how often is he doing that after a three-bet after the first draw? $380 in a $1000 pot for the one-card 9-8-low draw is optimistic.
The real odds of making a two-card draw to an eight-low are harder to calculate since the opponent’s hidden cards are not random, but the simple odds are about 27.1% to make an eight low. Given that making it will likely result winning future bets, $380 for drawing two cards is about right.
In any case, after a slightly complicated process that I’ll skip describing, Poker-CNN-7 chooses to draw one in a decision that he thinks is close. He’s drawing dead.
'', 0, False, 1400.0, '', 5, 5, '10111010']
- bet: $1047 (95.6%)
- check: $969 (4.4%)
After the second draw, Poker-CNN-76 thinks his pat hand is worth $1,050 in a $1,400 pot if he bets it, so he does. CNN-7 makes his 9-8-low, and gladly calls.
['', 1, True, 1200.0, '1', 4, 5, '1011101']
- raise: $296 (0.5%)
- call: $420 (99.5%)
- fold: $0 (0.0%)
After all the enthusiasm that CNN-76 has shown for his hand out of position, standing pat and betting into CNN-7 on multiple streets, CNN-7 does not make his own hand a favorite to win. But it has substantial value for calling down in a $1,200 pot.
CNN-76 stands pat again, and CNN-7 should think about breaking the 9-8-low.
['', 1, True, 1400.0, '10', 4, 5, '10111010']
- ‘pat': $770
- ‘draw_1': $582
It’s a little frustrating to see that the same AI that just valued the 9-8-low at $420 in a $1200 pot, now estimates it at $350 higher by standing pat. However, both the pat and “draw_1” value are high. I’d probably stand pat at this point, albeit knowing that I have a low probability of winning.
CNN-76 bets out again.
['', 0, True, 1600.0, '1', 5, 5, '101110101']
- raise: -$460 (0.0%)
- call: $80 (66.0%)
- fold: $0 (34.0%)
By the end of the hand, CNN-7 realizes that his 9-8-low isn’t very good. He only assigns the call a value of $80 in a $1600 pot. Since he’s facing a $200 bet, that is equivalent to a 15.5% chance to win at showdown.
Of course with a pot this size, and after standing pat, I’m probably sighing and calling. If playing live, I might try to get a read on my opponent, and if I don’t sense nervousness, I’d fold it sometimes. Paying off here 100% of the time isn’t great, but you also can’t fold a 9-8-low every time and expect not to get run over. The real mistake was taking one card instead of two, with two draws to come.
What's frustrating about playing poker with a neural network can be the AI’s lack of internal consistency. While I hope you’ll agree that most of the AI estimates look about right, and others might be interesting, some of the values are just plain wrong. On the last decision, it considers raising with 9-8-low to be worth -$460. This can’t be right at $100/$200 limit.
On the plus side, the Poker-CNN plays quickly, with each decision a direct result of the neural network. Unlike a chess engine, it doesn’t simulate the hand, look up odds, or search for a related solution. It just propagates the inputs through a “deep” eight-layer neural network, and interprets what the betting or drawing policy should be. This takes about half a second on my laptop.
Remember, the Poker-CNN is the first version of a general poker AI. It does not try to solve heads-up triple draw, nor does it use the rules of the game. I train it on hand histories. It’s given each hand in a format that allows it to consider the cards, the size of the pot, and the bets made so far. From that, the neural network can group together similar inputs, look for patterns, and thus estimate the expected result of any future spot that it might encounter.
It would be possible to feed the AI inputs such as “you have a flush” or just a #1-#50,000 ranking, telling it that the 7-5-4-3-2 hand is actually the nuts. But this goes against the spirit of the project. Instead, I feed the AI all of the cards that a player sees, in a common format. The cards are the same, whether it’s learning triple draw, hold’em, or pot-limit Omaha, and there is are no assumption about the rules of the game. The rules are not arbitrary, and I would hope that with enough sample hands, the AI could find the patterns necessary to play pretty good poker, and eventually very good poker.
For now, what we have is the heads-up triple draw. I don’t have it publicly available in an app or on a website yet. But if you’re a mixed games player and would like help us find out how the Poker-CNN stacks up against human expert players, please contact the author, either in the comments below or via Twitter @ivan_bezdomny. (And thanks to the players who’ve already given it a shot.)
Photo: “Bluff for the win,” Jun. Creative Commons Attribution ShareAlike 2.0 Generic.
Nikolai Yakovenko is a professional poker player and software developer residing in Brooklyn, New York who helped create the ABC Open-Face Chinese Poker iPhone App.