IE 11 is not supported. For an optimal experience visit our site on another browser.

This AI Learned Atari Games Like Humans Do - And Now it Beats Them

A new AI system outperforms people on a number of old Atari titles, but it learned how to do so just like a human - and may learn much more soon.
The state of the AI's learning can be visually inspected, which shows how it has clustered and categorized different types of data.
The state of the AI's learning can be visually inspected, which shows how it has clustered and categorized different types of data.Google DeepMind

Ever been totally dominated by the computer player in a video game? A new artificial intelligence system takes on all comers with a handful of old Atari titles, and it does so after learning the rules bit by bit like a human. Its creators claim this is just the very beginning of what it can do. In a few years, it may be driving you to work.

"What we're trying to do is use the human brain as an inspiration," Google DeepMind researcher Demis Hassabis told reporters in a telephone conference call about the research, published in Thursday's issue of the journal Nature. "This is the first rung of the ladder to showing that a general learning system that goes from end to end, from pixels to actions, even on tasks that humans find difficult."

And as anyone who played games in the '70s and '80s will remember, Atari 2600 games were definitely difficult. The AI outscored humans on 23 out of 49 games, such as Road Runner, Space Invaders and Breakout, and came close on many more.

The games at which AIs in general perform better than humans (in grey) and DQN performs exceptionally (blue) tended to be more action oriented, lacking exploration or experimentation elements. The percentage is how much better the AI performed than a human.Google DeepMind

But the way it wins isn't through comprehensive and specific training, as is the case with the systems pitted against grandmasters in chess.

"The programmers and chess grandmasters distilled chess knowledge into a problem," Hassabis explained, "whereas what we've done is build algorithms that build from the ground up. They can learn and adapt from unexpected things."

The Deep Q-network agent, or DQN as the researchers at DeepMind call it, approaches things the way a person might. All it "knows" is that it wants to maximize the score, and by watching the game carefully and observing which actions increase that score, it learns how to play — then how to play better.

Say a shot from an Space Invader is mere pixels away from striking the ship. On its first try the AI may simply allow the game to end. But on another run-through, it may find that by avoiding shots, it gets more chances to fire the ship's gun, destroying enemies and raising the score. DQN even hit on unique strategies, finding safe spots or getting easy points that humans never tried.

And what makes this system particularly powerful is that it can take that knowledge and apply it to other situations, in other games. It turns experiential data into knowledge that can be used in situations it's never been in — something artificial intelligence systems generally do poorly, but humans do very well.

Atari's revenge

Of course, the researchers aren't aiming to set the record high score in Breakout. At this stage, Atari games are just sophisticated enough to be a challenge to the system while still giving it a chance to excel. Breakout in particular was susceptible to its style, as this video shows:

But even some games we think of as basic escape DQN's strategy (in this study, at least) of trying random actions and remembering which works best.

"For many games this strategy will not work — they require more sophisticated exploration," said Volodymyr Mnih, co-author of the study. "Games where the system doesn't do well are ones that require long-term planning. For instance, in Ms. Pac-Man, if you have to get to the other side of the maze you have to perform quite sophisticated pathfinding and avoid ghosts to get there."

The Atari-playing iteration of DQN has a short memory, only looking at the last four frames of the game and making a decision based on those. Perhaps with a longer memory and more training, it could crack the Ms. Pac-Man code, but the team isn't worried about that right now. The results of their research were more than positive enough for them to move on to more complex games, where the system may have to learn such complex strategies from the start.

"We are now moving towards the games of the '90s, racing games and other types of 3-D games, where the challenge is much greater," said Hassabis.

In particular, he said he looked forward to the AI mastering the arcade classic Yar's Revenge, then moving on to the far more sophisticated Starcraft and Civilization games. AI already exists for those games, of course — you can play versus the computer. But as with chess, it's a different type of AI, one that is following a set of game-specific rules built into them by a programmer who knows all the right moves. DQN would be taking everything it learned from all the other games it's played and bringing them to bear on something new.

Robots that improvise

The ultimate goal isn't just to give computers more ways by which to beat their human opponents. But an AI that can deal with unexpected circumstances is a valuable thing when it comes to robotics and automation.

Think industrial or household robots that are smart enough to react gracefully to the unexpected presence of a person or obstacle, or virtual assistants like Siri and Cortana improvising intelligently on requests instead of simply failing to comprehend or produce responses. DQN doesn't take a supercomputer to run (though it helps during training), so it's the kind of thing you might find anywhere.

For now, the next step is hush-hush, but the team said they're already testing DQN on more complex data, and not just Super Nintendo games.

"Ultimately, if the agent can drive a car in a racing game then, with a few tweaks, it can drive a real car," hinted Hassabis. Whether people will like the idea of their autonomous vehicle going with its computerized gut is another question altogether.

Mnih and Hassabis are among 19 authors of the paper published in Nature, "Human-Level Control Through Deep Reinforcement Learning."