Master Baboon The sea of the simulation

12Sep/090

My AI reads your mind — Extensions (part 3)

In the previous two posts I showed how to make use of decision theory to write a game AI that forms a model of its opponent and adapts its strategy accordingly.

The AI could be improved in several ways:

  • The most obvious improvement would be to build a better model of the opponent. In the Karate game I used a 2nd order Markov model, i.e., I assumed that the next move of the player only depends on his previous two moves. It is of course possible to use an higher-order model, that would keep track of three or more past moves. However, a long history means a much larger number of parameters to estimate, so that it will take much longer for the AI to have a reasonable estimate of the player's behavior. An easy workaround would be to collect higher-order statistics, but only use them when enough data is available; this however would still fail if the player decides to adopt a new strategy. One could also use another class of models, like, for example a recurrent neural network. I prefer not to use neural networks, first of all to avoid the usual voodoo of choosing parameters like learning rate, network architecture, etc., and second because they work in a black-box fashion, which makes it difficult to extend them in principled ways, as for example suggested below.
  • When the game is started, the player statistics are blank, in the sense that all the player's moves are considered equally likely. This is our initial prior probability for the opponent's moves. However, it is probable that humans have common biases in the kind of action sequences they choose. One could adapt the game to allow it to learn this biases over many games with different players, and use them as the initial prior, thereby improving the initial phase of the game.
  • A related point is that at the moment I do not take into account the uncertainty about the player's move estimation. At the start of the game the AI has only a few examples on which to base its prediction of the next move, and it should take this into account when making decisions. The formal way to capture this uncertainty is to define an hyperprior on the transition probabilities, and then integrate over it when predicting the next move: so far, the transition probability is given by a matrix, p(n(t+1) | n(t)) = N_{n(t),n(t+1)}, where N is a matrix and for simplicity I'm only considering a first order Markov model. N is considered to be a constant at every time point; in reality, N is also a random variable that is being estimated from the player's move. It is thus only natural to put a prior on N itself, P(N) (e.g., a Dirichlet distribution for every row of the matrix). The improved prediction, which takes into account our uncertainty about N is given by p(n(t+1) | n(t)) = integral over N of p(n(t+1) | n(t), N) p(N) .
  • In the karate game, the AI tries to maximize its score by picking the action with the highest expected score. Another strategy would be to choose every action with a probability related to the score, for example using p(choosing action a) to be proportional to exp(beta * score of a), where beta is a constant that controls the "softness" of the decision: for beta=0, all actions are chosen with the same probability; for large betas, only the action with largest score is chosen. This seems to be closer to the way human takes decision, the so called probability matching rule. This strategy is suboptimal if the second order Markov model is the real model of the opponent, but since it is not, it appears to perform better in practice as the AI becomes less predictable (I updated the Karate game in the previous post to do probability matching).
  • A major improvement to the AI would be to take into account the so-called theory of mind, i.e., the fact that while the AI is building a model of the player, the player is doing the same for the AI, and trying to maximize his own score. Taking this aspect into account is quite complex, as one falls rapidly into a deep I-know-that-you-know-that-I-know kind of reasoning. Managing to do so, however, is likely to be highly rewarding for the player's experience of the game: several studies have shown how humans activate areas of the brain that are associated with theory-of-mind when playing against a human opponent, but fail to do so against a computer (see Gallagher and Frith, Functional imaging of ‘theory of mind’, Trends in Cognitive Sciences, 7(2), 2003, for a review). It is thus possible that by writing computer programs that make use of theory of mind themselves, those centers would become engaged, giving the player the impression of playing against a human opponent.
Tagged as: , , No Comments
21Feb/095

Advanced math functions for AS3

Did you ever need to sample from a Dirichlet distribution in Actionscript 3.0? What about computing a value of the Gamma function? Probably not 😉 In case you did, I wrote a class with a few math methods:

  • a function to sample from multinomial distributions; this is useful if you need to choose randomly from a set of elements, each with a different probability;
  • functions to draw samples from the Dirichlet and Gamma distribution;
  • the Gamma function
  • a factorial function that works even for large values (using the identity n! = Gamma(n+1))
  • other utility functions, e.g. to compute mean and variance of an array, or add two arrays together

You can download the AdvancedMath class here, it includes AsUnit tests.

This is an example of how to use the AdvancedMath class to simulate a rigged dice: the code below throws the dice 1000 times, and keeps a count of how many times one gets the different faces.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import com.masterbaboon.AdvancedMath;

// probability of the faces; 6 is about twice more likely to appear
var p:Array = [0.14, 0.14, 0.14, 0.14, 0.14, 0.3];
           
// count the number of times a face appears
var count:Array = AdvancedMath.zeros(6);
           
// throw dice 1000 times
var face:int;
for (var i:int = 0; i < 1000; i++) {
    face = AdvancedMath.sampleMultinomial(p);
    count[face]++;
}
           
// show how many times we got which face
trace(count);

[ad#adpost]