Testing the Hypothesis: Atari Games Yield Unexpected Results

Researchers explored a fascinating phenomenon by training AI agents to play modified Atari games with added unpredictability. They were astonished to find that a consistent “indoor training effect” emerged across various games and their modifications.

The researchers believe these findings could inspire further investigations aimed at improving AI training methods.

“This opens up a completely new avenue for exploration. Instead of focusing solely on matching training and testing environments, we might be able to design simulated environments that enhance an AI agent’s learning,” stated co-author Spandan Madan, a graduate student at Harvard University.

Madan and Bono collaborated on this paper with Ishaan Grover from MIT, Mao Yasueda from Yale, Cynthia Breazeal, a professor at MIT and head of the Personal Robotics Group in the MIT Media Lab, Hanspeter Pfister, the An Wang Professor of Computer Science at Harvard, and Gabriel Kreiman, a professor at Harvard Medical School. Their research will be presented at the Association for the Advancement of Artificial Intelligence (AAAI) Conference.

Understanding AI’s Performance Issues in Novel Environments

The researchers aimed to uncover why reinforcement learning agents typically struggle when tested in environments that differ from those in which they were trained.

Reinforcement learning is a trial-and-error process where agents explore a training environment to learn actions that maximize their rewards.

To investigate, the team developed a technique to introduce a specific amount of noise into the transition function, a key element of reinforcement learning that defines the probability of an agent moving from one state to another based on its chosen action.

For instance, in a game of Pac-Man, the transition function might determine the likelihood of ghosts moving in various directions on the game board. Traditionally, an AI is trained and tested using the same transition function.

When the researchers injected noise into this function using the conventional approach, they observed a decrease in the agent’s performance in Pac-Man. However, when the agent was trained on a noise-free version of the game and then tested in an environment with a noisy transition function, it outperformed agents trained on the noisy version.

“The common belief is that you should accurately capture the transition function of the deployment condition during training for optimal results. We rigorously tested this idea because we found it hard to believe,” Madan explained.

By manipulating the transition function with varying degrees of noise, the researchers tested multiple environments, but this method did not yield realistic gameplay. As they increased noise levels, the likelihood of ghosts randomly teleporting around the board increased.

To determine if the indoor training effect also applied to standard Pac-Man games, they adjusted the underlying probabilities so that ghosts moved more realistically but were more likely to travel vertically rather than horizontally. Remarkably, AI agents trained in noise-free settings still excelled in these realistic scenarios.

“It wasn’t merely the way we added noise to create artificial environments. This appears to be a fundamental characteristic of the reinforcement learning problem, which was even more surprising,” Bono noted.

Exploring AI Learning Patterns: An Unexpected Revelation

As the researchers delved deeper for explanations, they identified correlations in how the AI agents navigated the training space.

When AI agents predominantly explore the same areas, the one trained in the noise-free environment tends to perform better. This could be because learning the game’s rules is easier without the interference of noise.

Conversely, if their exploration patterns diverge, the agent trained in the noisy environment often outperforms its counterpart. This might be due to the agent needing to grasp patterns that cannot be learned in a noise-free setting.

“For example, if I only practice tennis with my forehand in a noise-free environment, but I have to use my backhand in a noisy one, I won’t perform as well in the noise-free setting,” Bono explained.

Future Implications: Harnessing the Indoor Training Effect

Looking ahead, the researchers intend to investigate whether the indoor training effect applies to more complex reinforcement learning environments and other AI methodologies, including computer vision and natural language processing. They also plan to create training environments that leverage this effect, potentially enhancing AI performance in unpredictable real-world situations.

Reference: “The Indoor-Training Effect: Unexpected Gains from Distribution Shifts in the Transition Function” by Serena Bono, Spandan Madan, Ishaan Grover, Mao Yasueda, Cynthia Breazeal, Hanspeter Pfister, and Gabriel Kreiman, January 8, 2025, Computer Science > Machine Learning.