Montezuma’s Revenge is one of the most challenging Atari games ArcadeImages/Alamy
An artificial intelligence that can remember its previous successes and use them to create new strategies has achieved record high scores on some of the hardest video games on classic Atari consoles.
Many AI systems use reinforcement learning, in which an algorithm is given positive or negative feedback on its progress towards a particular goal after each step it takes, encouraging it towards a particular solution. This technique was used by AI firm DeepMind to train AlphaGo, which beat a world champion Go player in 2016.
Adrien Ecoffet at Uber AI Labs and OpenAI in California and his colleagues hypothesised that such algorithms often stumble upon encouraging avenues but then jump to another area in the hunt for something more promising, leaving better solutions overlooked.
Advertisement
鈥淲hat do you do when you don’t know anything about your task?鈥 says Ecoffet. 鈥淚f you just wave your arms around, it鈥檚 unlikely that you鈥檙e ever going to make a coffee.鈥
To solve this problem, the team created an algorithm that remembers all the different approaches it has tried and keeps returning to moments in which it had a high score as a starting point from which to explore further.
The software stores screen grabs from a game as it plays to remember what it has tried, grouping together similar-looking images to identify points in the game it should return to as a jumping-off point. The algorithm’s aim is to maximise its score and it updates its record of a starting point when it is used to reach a new high score with a new screen grab from that part of the game.
Atari games don鈥檛 normally allow players to revisit any point in time, but the researchers used an emulator 鈥 software that mimicked the Atari system 鈥 with the added ability to save stats and reload them at any time. This meant the algorithm could begin from any point without having to play the game from the start.
The team set the algorithm to playing a collection of 55 Atari games that has become a standard benchmark for reinforcement learning algorithms. It beat state-of-the-art algorithms in those games 85.5 per cent of the time.
In one particularly complex game, Montezuma鈥檚 Revenge, the algorithm scored higher than the previous record for reinforcement learning software and also beat the human world record.
Once the algorithm had reached a sufficiently high score, the researchers used the solution it came up with to train a neural network to replicate the strategy and play the game the same way, doing away with the need for reloading save states with an emulator. This alternative approach turned out to be more computationally intensive, as the neural network version of the algorithm created billions of screen grabs while solving each game.
Peter Bentley at University College London says the team鈥檚 approach of combining reinforcement learning with an archive of memories could be used to tackle more complex problems. 鈥淭his is a nice new combination of techniques that seem to provide a real enhancement.鈥
Nature
Topics:



