Master Baboon The sea of the simulation


Tracking down the enemy

As another scientific Python course is approaching, I've been brushing up my PacMan skills. I decided to give a try to a strategy I had been thinking on, which relies upon having a good estimate of the enemy's position. I should remind the reader that in the PacMan capture-the-flag game, one team does not know the exact position of the other agents unless they are within 5 squares of one's own agents. The game does, however, return a rough estimate of the opponent's distance. Our agents-tracker will thus have to blindly make its best guess, and keep a probability distribution over possible positions.

To estimate the position of the opponent agents we need to apply some probability theory:

P(x(t)) = sum_{x(t-1)}  P( x(t-1) ) P( x(t) | x(t-1) )

or, in other words, the probability that the agent is at position x(t) at time t is equal to the sum of the probability of it being at x(t-1) at time t-1, times the probability of transitioning from x(t-1) to x(t). The first term is given simply by the previous estimate, while the second term is our model of the behavior of our opponent (*).

For example, a very conservative model could assume that the opponent could take any legal move at random:

P( x(t) | x(t-1) ) = 1/N

if x(t-1)->x(t) is a legal move, where N is the total number of legal moves from x(t-1), and

P( x(t) | x(t-1) ) = 0

otherwise. This video shows how such a model performs when the opponent behaves exactly as assumed; the red agent, in the bottom left corner, is estimating the position of the blue agent in the opposite corner; the area of the red squares is proportional to the probability P(x(t)):

The tracker is doing a good job in this case, but fails miserably for a more realistic opponent:

We clearly need to improve the opponent's model... luckily another simple model results in a large improvement: we can safely assume that the opponent tends to explore new parts of the maze in search of food. We can formalize this as

P( x(t) | x(t-1) ) = 1/Z exp(-beta * v(x(t))

if x(t-1)->x(t) is a legal move, and 0 otherwise. v(x) is the number of times the agent visited x in the past, and beta is a constant that controls the how exploratory the opponent is. When beta=0, the model is equivalent to the previous random model. Z is a normalizing constant such that P( x(t) | x(t-1) ) sums to 1.

Let's see how this model does in practice (in the video, beta = 10):

Much better, isn't it? We can do even better by using two other sources of information: first, the game gives us a noisy estimation of the distance of the opponent (actual distance +/- 6); second, we know that if the opponent is not visible, it must be at least 5 squares away. We can take this information into account by setting P(x(t)) to 0 for squares that lie outside the noisy distance range, and for those inside the visibility range.

The last video shows the complete tracker at work. The blue lines show the area in which the agent might be according to the noisy distance, and the green line shows the visibility range:

The biggest area for improvement here is the agent's model P(x(t)|x(t-1)). One possibility could be to simulate several common strategies, and to use the transition statistics for the simulated agents to estimate that probability...

Now, can we use the estimated position to program better AI agents? I'll give it a try, and report back soon!

(*) Strictly speaking, we are doing "filtering" here, i.e., we're estimating the current position assuming the past inferences are fixed. The alternative is to do "smoothing", where the full joint probability P(x(t), ..., x(1)) is estimated at each step. The information coming from the new observation is propagated back and forth at each to improve the past inferences. For example, knowing that the agent is at a given position at time t might exclude another position at time t-2 because of too large a distance, which in turn could improve the estimate at time t.

Facebook Twitter Email