女生小视频

Technology

AI beats professionals at six-player Texas Hold 鈥橢m poker

By Donna Lu

11 July 2019

A person playing online poker

cyano66/Getty

Artificial intelligence has finally cracked the biggest challenge in poker: beating top professionals in six-player, no-limit Texas Hold 鈥橢m, the most popular variant of the game.

Over 20,000 hands of online poker, the AI beat 15 of the world鈥檚 top poker players, each of whom has won more than $1 million playing the game professionally.

The AI, called Pluribus, was tested in 10,000 games against five human players, as well as in 10,000 rounds where five copies of Pluribus played against one professional 鈥 and did better than the pros in both.

Pluribus was developed by Noam Brown of Facebook AI Research and Tuomas Sandholm at Carnegie Mellon University in Pennsylvania. It is an improvement on their previous poker-playing AI, called Libratus, which in 2017 outplayed professionals at Heads-Up Texas Hold 鈥橢m, a variant of the game that pits two players head to head.

Part of what makes poker so difficult for AI to master is the huge number of possible actions to make, says Tristan Cazenave at the Paris Dauphine University. There are more possibilities than there are atoms in the universe.

It also involves hidden information, in which a player has access only to the cards that they see 鈥 meaning that an AI has to take into account how it would act with different cards so it isn鈥檛 obvious when it has a good hand.

鈥淚f you look at real-world interactions, most of them involve hidden information, multiple participants or both,鈥 says Brown. Pluribus鈥檚 approach could be applied to situations in cybersecurity, or in having self-driving cars navigate traffic, he says.

Pluribus learned to master the game by playing against five copies of itself, an approach that has been used by other AIs to master games such as Go, Dota 2 and StarCraft II. It started as a poker novice with no knowledge of the game, learning the rules over trillions of hands and improving its strategy by reviewing the decisions it made every round.

Plays like a bot

In games against five human professionals, Pluribus won by an average of 48 milli-big blinds per game 鈥 a measure of how many big blinds were won on average per thousand hands of poker.

Each human player was given an alias for the duration of the tournament, to deter people who knew each other from potentially teaming up against Pluribus.

鈥淲e made no effort to hide who the bot was,鈥 says Brown, partially because its play style was obvious 鈥 Pluribus plays the first few actions in a round instantaneously because it has already prepared its strategy for those moves, while a human player typically takes a few seconds to decide.

Knowing which player was Pluribus meant the human player could attempt to trick the AI, says Jason Les, a professional poker player who was involved in the tournament. He played in the rounds that pitted five humans against Pluribus, playing an estimated 2000 hands over 12 days.

鈥淵ou really want to push the AI, try everything you can to find a weakness,鈥 says Les. 鈥淥bviously we weren鈥檛 able to.鈥

Les also played against Libratus in 2017. 鈥淚 was pretty amazed that they had made so much progress in just a couple of years,鈥 he says. 鈥淲hat was particularly impressive about this challenge was that the AI played faster and on much less computing power.鈥

To reduce the number of potential choices that Pluribus needed to consider, the AI grouped similar hands 鈥 for example, a king-high flush and queen-high flush 鈥 and only considered a few different sizes of bets for a given hand.

鈥淎t the end of the day, betting $150 is a lot like betting $151,鈥 says Brown. Instead of treating those bets separately, Pluribus groups them together and treats them identically.

No guarantees

鈥淲e actually use very few computing resources to produce this AI,鈥 says Brown. Training Pluribus required less than 512 gigabytes of memory, which would cost less than $150 using cloud computing services.

While Pluribus played better than human poker players, according to a game theory principle called the Nash equilibrium there was no theoretical guarantee it would always win, says Cazenave.

A Nash equilibrium occurs in non-cooperative games where each player has a list of strategies and no player can improve on their performance by changing to a different strategy. While a Nash equilibrium strategy is unbeatable in Heads Up Texas Hold 鈥橢m, we still have no way of finding one for the six-player variant of the game.

鈥淭his is actually why the AI community finds this so surprising,鈥 says Brown. 鈥淎 lot of people didn’t think that this would be possible 鈥 to beat top humans using these techniques.鈥

Cazenave says that similar approaches could be used to develop AIs that can play other complex multiplayer games such as mahjong and bridge.

Journal reference:Science,听DOI: 10.1126/science.aay2400

Topics:

Sign up to our weekly newsletter

Receive a weekly dose of discovery in your inbox. We'll also keep you up to date with New 女生小视频 events and special offers.

Sign up
Piano Exit Overlay Banner Mobile Piano Exit Overlay Banner Desktop