Let’s-a go! The algorithm is motivated by a desire to explore rather than to score points Nintendo
I wonder what will happen if I press this button? Algorithms armed with a sense of curiosity are teaching themselves to discover and solve problems they鈥檝e never encountered before.
Faced with level one of Super Mario Bros, a curiosity-driven AI learned how to explore, avoid pits, and dodge and kill enemies. This might not sound impressive 鈥 algorithms have been thrashing humans at video games for a few years now 鈥 but this AI鈥檚 skills all were all learned thanks to an inbuilt desire to discover more about the game world.
Conventional AI algorithms are taught through positive reinforcement. They are rewarded for achieving some kind of external goal, like upping the score in a video game by one point. This encourages them to perform actions that increase their score 鈥 such as stomping on enemies in the case of Mario 鈥 and discourages them from performing actions that don鈥檛 increase the score, like falling into a pit.
Advertisement
This type of approach, called reinforcement learning, was used to create AlphaGo, the Go-playing computer from Google DeepMind that beat Korean master Lee Sedol by four games to one last year. Over thousands of real and simulated games, the AlphaGo algorithm learned to pursue strategies that led to the ultimate reward: a win.
But the real world isn鈥檛 full of rewards, says , who led . 鈥淚nstead, humans have an innate curiosity which helps them learn,鈥 he says, which may be why we are so good at mastering a wide range of skills without necessarily setting out to learn them.
So Pathak set out to give his own reinforcement learning algorithm a sense of curiosity to see whether that would be enough to let it learn a range of skills. Pathak鈥檚 algorithm experienced a reward when it increased its understanding of its environment, particularly the parts that directly affected it. So, rather than looking for a reward in the game world, the algorithm was rewarded for exploring and mastering skills that led to it discovering more about the world.
This type of approach can speed up learning times and improve the efficiency of algorithms, says at Google鈥檚 AI company DeepMind. The company used a last year to teach an AI to explore a virtual maze. Its algorithm learned much more quickly than conventional reinforcement learning approaches. 鈥淥ur agent is far quicker and requires a lot less experience from the world to train, making it much more data efficient,鈥 he says.
Fast learner
Imbued with a sense of curiosity, Pathak鈥檚 own AI learnt to stomp on enemies and jump over pits in Mario and also learned to explore faraway rooms and walk down hallways in another game similar to Doom. It was also able to apply its newly acquired skills to further levels of Mario despite never having seen them before.
But curiosity could only take the algorithm so far in Mario. On average, it explored only 30 per cent of level one as it couldn鈥檛 find a way past a series of pits that could only be overcome through a sequence of more than 15 button presses. Rather than jump to its death, the AI learned to turn back on itself and stop when it reached that point.
The AI may have been flummoxed because it had no idea that there was more of the level to explore beyond the pit, says Pathak. It didn鈥檛 learn to consistently take useful shortcuts in the game either, since they led it to discovering less of the level so didn鈥檛 satiate its urge for exploration.
Parker is now working on seeing whether robotic arms can learn through curiosity to grasp new objects. 鈥淚nstead of it acting randomly, you could use this to help it move meaningfully,鈥 he says. He also plans to see whether a similar algorithm could be used in household robots similar to the Roomba vacuum cleaner.
But Jaderberg isn鈥檛 so sure that this kind of algorithm is ready to be put to use just yet. 鈥淚t鈥檚 too early to talk about real-world applications,鈥 he says.
Journal reference:
Read more: AI learns to play video game from instructions in plain English
Topics:



