our upcoming article will be about our implementation of a learning AI for Project Aleron. As this topic is rather complicated, we wrote up an introduction to machine learning and more particularly reinforcement learning for you all.
Table of Contents
Artificial Intelligence in Games
Many games offer the possibility to fight computer-controlled enemies (Bots).
Bots play by accepting an input (the game state) and reacting with a specified output (their chosen action).
In nearly all cases the AI consists of a large strict ruleset, no matter how intelligent it appears to be.
This approach has proved itself over the years, but comes at a cost:
- The developers must write all rules incl. corner cases
- The (usually huge) ruleset must be adapted to game changes/updates by hand
- The bots behaviour is predictable and can be abused by experience players
Wouldn’t it be handy if the AI could learn by itself and adapt to changes?
Machine Learning is a subfield of computer science, which aims to generate knowledge out of experience.
In short: Machine Learning enables bots to learn from experience.
When a bot plays, depending on the used machine learning method, he gets rewards during or after play.
The bot aims to get increasing rewards and remembers, what was good or bad for him. This way he gets better and better.
Out of all the available Machine Learning Methods we chose reinforcement learning, as it is easier to understand and control the decision process of the bots in Project Aleron compared to other machine learnning methods.
Here is a video of a bot, which learns to drive a car via Reinforcement Learning.
(Video as well as implementation are not from us)
Reinforcement Learning is a field of Machine Learning.
Here the bot explores his environment by trying out. If it does something good, he gets a reward and when he does something wrong he is punished.
Let’s take chess as an example to describe the method.
For the implementation we need the following:
- Possible Actions – In order to play the game, the bot must first know what possibilities he has. In chess these are the possible moves for each figure.
- State Representation – The bot must be able to learn situational behaviour, it must be able to connect the performed actions with the given situation. For this we need a state representation. In chess the state would contain the position of all remaining figures.
- Reward System – The bot must know if his decisions were good or bad. For this we need a reward system which rewards or punishes (negative reward) special situations. For example, the boss can be rewarded for winning and punished for losing. The system can be extended to accelerate the learning process. Whenever the bot captures an enemy figure, it is rewarded. Whenever an enemy captures a piece, he is punished.
When we have this information, the process is as following:
First the bot accepts the current state as input. It then checks, if it has reward information (rating) for this state. If so, then the bot can make an informed decision. If not, then it must choose an action randomly. This is repeated for each input until the bot receives a reward. This information is stored in its memory and the bot plays another match. When the bot has played enough matches, it learns to achieve rewards and avoid punishments.
This process has two problems:
- The first problem is the amount of possible combinations of states and actions. The bot must visit every possible game state once per possible action and try all combinations out to build up its memory of rewards. With a large number of states and actions, this turns into a very time-consuming task. (also known as the curse of dimensionality)
- The second problem is the timing and amount of rewards. Actually the bot should have as much freedom as possible to explore everything and try things out. As this is very time-consuming, the bot can be rewarded more often, so that it can build up its memory faster. If these rewards are improperly distributed, weighted or conflicting, the bot cannot learn properly because his memory is disturbed repeatedly. Finding the mistake in such systems is difficult, because the memory is built up after many games and gradually evolves.
As great as Machine Learning may sound, it is hardly applied within the games industry. The effort of implementing a learning AI is very high and when you run into problems, its hard to find the source.
After all you have to tune the learning process, which lead to the false behaviour.
That’s quite an indirect approach.
We are up to the challenge though, as Machine Learning has great potential.
The benefits for such a complex game as Project Aleron would be fantastic!
Authors: Sergej Tihonov, Eike Wulff and Benjamin Justice