Teaching computers to play video games like Atari requires them to understand complex environments and make quick decisions. Traditional methods often rely on continuous representations, which work in some cases, but prove difficult with tasks that require distinct decisions. In recent years, transformers—a type of neural network—have shown great success in handling these challenges, particularly in areas like language and image processing. We used transformers trained with reinforcement learning to learn how to play Atari games more efficiently and, in so doing, have paved the way for potential real-life applications.
Why Use Transformers?
Transformers excel at capturing long-term patterns and relationships in data. Initially developed for language processing—where they help computers understand sentences by considering the meaning of each word in context—transformers have also been applied to images and other types of data. In video games like Atari, transformers help understand how the game environment evolves, and to make smarter decisions by focusing on both immediate actions and their long-term consequences.
Our model, DART (Discrete abstract representation for transformer-based learning) as shown in Figure 1, combines a transformer-decoder for understanding the game environment and a transformer-encoder for making decisions. This allows the AI agent to better understand the dynamics of the game and plan how to improve its performance.
DART Performance
We tested DART on the Atari 100k benchmark. As shown in Figure 2, DART outperformed previous models that did not use transformers, achieving higher scores in 9 out of 26 games. It showed particular strength in games with fast-moving objects, like Breakout and Seaquest, where understanding long-term actions is essential.
Why Learn to Play Atari Games?
Although video games may seem like a fun and isolated challenge, learning to master them has valuable real-world implications. Atari games—with their rich variety of environments and tasks—help models improve their ability to plan, strategize and make decisions quickly in changing environments. This type of learning is similar to what’s needed in real-world scenarios like robotics, autonomous driving or healthcare. For example:
- In robotics, the ability to predict and act efficiently can improve the way robots navigate spaces or manipulate objects.
- In autonomous driving, AI must understand its surroundings and anticipate the next steps to make safe decisions.
- In healthcare, AI can assist in planning treatments, by learning from patterns in patient data and predicting outcomes.
By teaching AI agents to handle the fast pace and unpredictability of video games, we are building systems that can adapt and excel in real-world environments.
The Future of AI with Transformers
Transformers are already revolutionizing fields like natural language processing and computer vision. Our work shows that they can also enhance AI’s ability to handle dynamic and complex environments, such as video games. While DART is currently designed for tasks with discrete actions, like those in Atari, we plan to expand this approach to real-world tasks that involve continuous actions, such as controlling robots or operating autonomous vehicles. This would allow AI systems to become even more capable and versatile in real-world applications.
Conclusion
Transformers are helping AI systems learn and make decisions more efficiently by focusing on long-term patterns. Our model, DART, uses transformers to improve performance in Atari games, a step that can help AI systems tackle real-world challenges in areas like robotics, autonomous driving and healthcare.
Additional Information
For more information on this research, please read the following paper:
Agarwal, P., Andrews, S.; Kahou, S.E. (2024). Learning to Play Atari in a World of Tokens. Proceedings of the 41st International Conference on Machine Learning.
Acknowledgment
We are grateful to our collaborators and supporters who have made this research possible.