Archive for the 'Artificial Intelligence' Category

PacMan capture-the-flag: a fun game for artificial intelligence development and education

At the beginning of September I’ve been invited to teach at a summer school about scientific programming. The whole experience has been really rewarding, but it was the student’s project that got me going: we had the students write artificial intelligence algorithms for the agents of a PacMan-like game, and organized a tournament for them to compete against each other.

The PacMan capture-the-flag game has been written originally by John DeNero, and has been used to teach an artificial intelligence course by him at Berkley and by Hal Daume III at University of Utah. Very often, this kind of games have a single strategy that dominates all others, and once you find it the interest fizzles out. In this case, I was impressed by how rich this game is. The game offers a lot of opportunities to develop and test complex learning and planning algorithms, including cooperation strategies for games with multiple agents.

capture_the_flag

The rules of the game are quite simple: the board is a PacMan maze, divided in a red and a blue half. The two halves belong to two teams of agents, who are controlled by computer programs to eat the opponent’s food and protect their own. When in the opponent’s half, the agents are PacMan (PacMen?), while in their own half, the agents are ghosts and can kill the opponent’s PacMan agents, in which case these are returned to their initial position. The players get one point for each food dot they eat; no points are assigned for eating the other team’s agents. The game ends when one of the two teams eats all of the opponent’s food, or after 3000 moves; the team with the highest score wins.

To make the game more interesting, one can only observe the position of the other team’s agents when they are very close to one’w own agents (5 squares away); otherwise, one can only obtain a noisy estimate of their distance.

The game is written in Python, my programming language of choice, which allows to write rapidly even sophisticated algorithms. I recommend the game to anyone wanting to organize an artificial intelligence course, or simply have fun writing AI agents. I plan to dedicate a couple of posts to the basic strategies to write successful agents in this game.

Here’s a video of the best students’ agents (red team) playing against the best tutors’ agents (blue team). The tutors won, saving our reputation!

Update: The authors of the PacMan capture-the-flag game decided to keep the game close-source, and in particular would prefer not to publish the code of agents playing their game, fearing that it might interfere with their course. It’s a shame because I was planning to write some Genetic Programming agents for the game, but of course I respect their decision. I guess there will be no series of posts re:PacMan…

My AI reads your mind — Extensions (part 3)

In the previous two posts I showed how to make use of decision theory to write a game AI that forms a model of its opponent and adapts its strategy accordingly.

The AI could be improved in several ways:

  • The most obvious improvement would be to build a better model of the opponent. In the Karate game I used a 2nd order Markov model, i.e., I assumed that the next move of the player only depends on his previous two moves. It is of course possible to use an higher-order model, that would keep track of three or more past moves. However, a long history means a much larger number of parameters to estimate, so that it will take much longer for the AI to have a reasonable estimate of the player’s behavior. An easy workaround would be to collect higher-order statistics, but only use them when enough data is available; this however would still fail if the player decides to adopt a new strategy. One could also use another class of models, like, for example a recurrent neural network. I prefer not to use neural networks, first of all to avoid the usual voodoo of choosing parameters like learning rate, network architecture, etc., and second because they work in a black-box fashion, which makes it difficult to extend them in principled ways, as for example suggested below.
  • When the game is started, the player statistics are blank, in the sense that all the player’s moves are considered equally likely. This is our initial prior probability for the opponent’s moves. However, it is probable that humans have common biases in the kind of action sequences they choose. One could adapt the game to allow it to learn this biases over many games with different players, and use them as the initial prior, thereby improving the initial phase of the game.
  • A related point is that at the moment I do not take into account the uncertainty about the player’s move estimation. At the start of the game the AI has only a few examples on which to base its prediction of the next move, and it should take this into account when making decisions. The formal way to capture this uncertainty is to define an hyperprior on the transition probabilities, and then integrate over it when predicting the next move: so far, the transition probability is given by a matrix, p(n(t+1) | n(t)) = N_{n(t),n(t+1)}, where N is a matrix and for simplicity I’m only considering a first order Markov model. N is considered to be a constant at every time point; in reality, N is also a random variable that is being estimated from the player’s move. It is thus only natural to put a prior on N itself, P(N) (e.g., a Dirichlet distribution for every row of the matrix). The improved prediction, which takes into account our uncertainty about N is given by p(n(t+1) | n(t)) = integral over N of p(n(t+1) | n(t), N) p(N) .
  • In the karate game, the AI tries to maximize its score by picking the action with the highest expected score. Another strategy would be to choose every action with a probability related to the score, for example using p(choosing action a) to be proportional to exp(beta * score of a), where beta is a constant that controls the “softness” of the decision: for beta=0, all actions are chosen with the same probability; for large betas, only the action with largest score is chosen. This seems to be closer to the way human takes decision, the so called probability matching rule. This strategy is suboptimal if the second order Markov model is the real model of the opponent, but since it is not, it appears to perform better in practice as the AI becomes less predictable (I updated the Karate game in the previous post to do probability matching).
  • A major improvement to the AI would be to take into account the so-called theory of mind, i.e., the fact that while the AI is building a model of the player, the player is doing the same for the AI, and trying to maximize his own score. Taking this aspect into account is quite complex, as one falls rapidly into a deep I-know-that-you-know-that-I-know kind of reasoning. Managing to do so, however, is likely to be highly rewarding for the player’s experience of the game: several studies have shown how humans activate areas of the brain that are associated with theory-of-mind when playing against a human opponent, but fail to do so against a computer (see Gallagher and Frith, Functional imaging of ‘theory of mind’, Trends in Cognitive Sciences, 7(2), 2003, for a review). It is thus possible that by writing computer programs that make use of theory of mind themselves, those centers would become engaged, giving the player the impression of playing against a human opponent.

My AI reads your mind and kicks your ass (part 2)

Get Adobe Flash player

In the last post I discussed how it is possible to program a game Artificial Intelligence to exploit a player’s unconscious biases using a simple mathematical model. In the karate game above, the AI uses that model in order to do the largest amount of damage. Give it a try! You get 10 points if you hit your opponent with a punch or a kick, 0 points if you miss, and 5 points if you block your opponent’s move. As you play, the AI learns your strategy and adapts to knock you down as often as possible.

How does it work? According to decision theory, we need to maximize the expected score. To compute the expected score for an action ‘x’ (e.g., ‘punch’), one needs to consider all possible player’s moves, ‘y’, and weight the possible outcome with the probability of the player doing that move, i.e.

E[score for x] = sum_y P(y) * Score(y,x)

where P(y) is the probability of the player choosing action ‘y’ (obtained using last post’s model), and Score(y,x) gives the score of responding ‘x’ to ‘y’.

For example, in the karate game using a low kick has a priori the highest chance of success: you score in 3 out of 4 cases, and only lose 5 points if the opponent decides to block your kick. This is why, at the beginning, the AI tends to choose that move. However, if you know that the AI uses that move often, you will choose the kick-blocking move more often, increasing P(kick-block). This change will make the punch more likely to score points. As you play, the optimal strategy changes and the AI continues to adapt to your style.

With a bit of practice, you’ll notice that you can compete with the AI and sometimes even gain the upper hand over it. This shows that you are in turn forming an internal model of the computer’s strategy. I think that the game dynamics that results from this interaction makes the game quite interesting, even though it is extremely simple. Unfortunately, it’s very rare to see learning AIs in real-life video games…

As always, you can download the code here.

Update: Instead of always making the best move, the AI now selects the move with a probability related to its score, which makes it less predictable. More details in the next post…

My AI reads your mind (part 1)

I regularly read about people complaining that AI in games should be improved. I definitely agree with them, but here’s a argument why pushing it to the limits might not be such a good idea: computers can easily discover and exploit our unconscious biases.

Magic? ESP? More like a simple application of decision theory. In order to make an unbeatable AI one needs two steps: 1) build a model of a player’s response in order to predict his next move, and 2) choose actions that maximize the expected score given the prediction of the model.

The basic idea behind 1) is that even if we try to be unpredictable, our actions contain hidden patterns that can be revealed using a pinch of statistics. Formally, the model takes the form of a probability distribution: P(next move | past observations).

Try it out: In the Flash example below, you can type in a sequence numbers 1-4, and the AI will try to predict your next choice. If your choices were completely random, the AI would only be able to guess correctly 25% of the time. In practice, it often guesses correctly 35-40% of the numbers! (It might take a few numbers before the AI starts doing a decent job.)

Get Adobe Flash player

In this example I used a 2nd order Markov model, i.e., I assumed that the next number, n(t+1), only depends on the past 2 choices: P(n(t+1) | past observations) = P(n(t+1) | n(t), n(t-1)). The rest is just book-keeping: I used two arrays, one to remember the past 3 numbers, and one to keep track of how many times the player chose number ‘k’, given that his past two moves were ‘i’ and ‘j’:

1
2
3
4
5
// last 3 moves
public var history:Array = [1, 3, 2];
// transition table: transitions[i,j,k] stores the number
// of time the player pressed 'i' followed by 'j' followed by 'k'
public var transitions:Array;

When the player makes a new choice, I update the history, and increment the corresponding entry in the transition table:

1
2
3
4
5
6
7
/*
* Update history and transition tables with player's move.
*/

public function update(move:int):void {
history = [history[1], history[2], move];
transitions[history[0]][history[1]][history[2]] += 1;
}

The probability that the next choice will be n(t+1), is given by the number of times the player pressed n(t+1) after n(t) and n(t-1) before, normalized by the number of time the sequence n(t-1), n(t) occurred in the past:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
/*
* Return probability distribution for next move.
*/

public function predict():Array {
// probability distribution over next move
var prob:Array = new Array(4);

// look up previous transitions from moves at time t-2, t-1
var tr:Array = transitions[history[1]][history[2]];

// normalizing constant
var sum:Number = 0;
for (var k:int = 0; k < 4; k++) {
sum += tr[k];
}

for (k = 0; k < 4; k++) {
prob[k] = tr[k] / sum;
}

return prob;
}

The best prediction is given by the choice with maximum probability. You’re welcome to have a look at the code!

In the next post, I’ll show how the AI can choose the best actions in order to maximize its expected score in a Virtual Karate game.