Sunday, January 21, 2007

Optimizing A*

When adapting A* algorithm to your game, you may find that sometimes it consumes too much CPU ticks – everything just stuck on the screen for a while (especially when path finding fails). Though A* itself has been there and well studied for quite a long time, the implementation flexibility has given us many choices which might lead to inefficiency. This article describes several bottlenecks emerged when developing path finding for a tile based RPG.

I didn’t start the algorithm from scratch – I just begin with Steve Rabin’s code in ‘Game Programming Gems I’ (3.5 A* Speed Optimizations). For using a priority queue (binary heaps) to implement open table is widely discussed, it’ll not be covered here.

And let’s check the first one…

Check Cell Blocking
After fetching a node from open table, A* will iterate thru all neighbor cells of the node, trying to add walkable ones to open table. For the neighbor who is already in open or close table, if the new path is better (a lower G value than the old one), A* will just refresh the state of the node (G value, in open or close table, and parent). How can we tell whether a cell is blocked or walkable?

In tile-based game, path finding is usually performed on the fringe layer (walls, trees, sceneries, and npcs etc.) where a cell might contain multiple game objects. For example, an npc can step into a cell and drop several items there. Each holdee may affect the cell’s blocking state:

for each object in current cell
if object can’t walk thru then
return cell is blocked
end
end
return cell is walkable

This procedure has two drawbacks that may slow down the whole A* algorithm:
i) If a cell contains multi objects, the number of loops executed may depend on the order of those objects packed in the cell.
ii) Object’s walkable state checking might be slow. It might be set by the map designer and can be only accessed thru script.

During game playing, we’ll notice that most cells’ blocking state is stable most of the time. We can eliminate the cost of querying by moving the blocking state to cell - if someone leaves, steps in, drops something into, take some item from, explode down a wall, lock or unlock a door in a cell, he is responsible for updating the cell’s blocking state (is_blocked).

And now to check if a cell is blocked is simple:

return cell.is_blocked

Bank: Find Node
Another step of A* is to determine if a neighbor cell is already in open or close table (find a node that match the neighbor cell’s coords).If neither of the two tables contains the neighbor, a new node will be created based on cell’s coords and cost then inserted into open table. As the path finding going on, open and close table'll grow bigger. It’s not practical to iterate thru the whole table bruteforcely to find a node. We need some indexing mechanism to speed up. Also we want to reuse those hundreds of nodes malloced during the last path finding.

A node bank is introduced into A* to attack the two issues above. It acts like a node pool to lighten the memory allocation burden of creating node. A node in the bank can be indexed by coords quickly enough. We’ll focus on how to optimizing node finding in this section.

A bank is usually implemented in hash map where the coords is the key and the node is the value (though hash_map is not a standard part of c++ STL, it’s available as an extension in major STL implementations – SGI, Microsoft, etc.). If you don’t want to write the compare functor for the key (coords (x, y)) of hash_map, you can pack them into a double use a trick (assume sizeof(double) == 2*sizeof(int)):

union TO_DOUBLE{
double dbl;
struct two_int{
int x, y;
} ints;
};

and the bank will be defined as hash_map < double, ANode* >. To find a node:

find x y
TO_DOUBLE.two_int.x = x
TO_DOUBLE.two_int.y = y
return find TO_DOUBLE.dbl in bank

hash map has a promise of constant time indexing, but that ‘constant time’ consists of several parts:
i) key hashing
ii) bucket list iterating
iii) accurate matching in bucket
While finding long paths, a profile may show that finding node in hash map is the bottleneck.

We can use direct indexing – a 2D array instead of a hash map to speed up:

vector < vector < ANode* > > bank;

And to find the node which is indexed by (x, y):

bank[y][x]

For a map of, say, 300*300 cells, the bank itself will cost about 300*300*sizeof(ANode*) which is about 350KB. For A* is a singleton, this memory impact is acceptable.

Bank: Create Node and Reset
We do not fill up the bank with newed node at first. In A*’s constructor we just resize the bank to the size of map and initialize it with 0. During path finding, A* will ask the bank to create new node. The bank will use the x, y coords passed in to check if the according slot is occupied by a node created in previous A*s. It’ll have to create new node if the slot is empty or just return that node in the slot – this process will warm up the bank with the newly created nodes.

Before perform a new path finding, we have to reset the nodes created by the bank during the last A* to make them available for this round. A hash map is still an appealing candidate to record the nodes in the bank to avoid brute-forcely iterating thru the 2D array of bank to reset. But we don’t need the quick-indexing ability (or cost) of hash map. Here a list will be better. Following code will create a node with coords (x, y), G value g, and open/close table position place:

ANode* Bank::createNode(int x, int y, int g, int place){
ANode*& slot = bank_[y][x];
if(slot){
//already in the bank,activate to make node available
slot->activate(x, y, g, place);
}else{
//not in, create a new node
ANode* node = new ANode(x, y, g, place);
//set into bank
slot = node;
//record all nodes in bank
ptr_list_.push_back(&slot);
}
//record node used in this round of path finding
ptr_list_busy_.push_back(&slot);
return slot;
}

We use two lists above: ptr_list_ holds all nodes in the bank and ptr_list_busy_ just holds nodes used in this round of path finding.

To reset the bank, we only need iterate thru ptr_list_busy_ (instead of the whole bank) to deactivate all busy nodes. Then the ptr_list_busy_ itself is cleared to be used in next round. As the pathfinding going on, the bank is warmed up by the newed nodes - the allocating overhead gets lower and the memory cost becomes higher. A threshold can be set here to limit the total number of nodes in the bank to gain a balance between speed and space.

Another improvement is to define ptr_list_busy_ as a vector instead of a list. When packing pointers at the tail, a vector will double its size if the capacity reaches, while a list will have to allocate a new list node for each pointer. Using a vector will reduce some malloc overhead. But the improvement might be varying for different length of path, make sure you’d profiled.

Iterating Neighbors
Iterating neighbors of current node might be another performance issue if not implemented properly. For the number of neighbors is different for some nodes (e.g. nodes on the edge of map), you might chose to new a vector or list of neighbors to iterating on. After profiling, you may find that the new operation is costly.

The solution is simple - use a fixed-size container that can hold max number of neighbors (usually 4~8, depending on cell’s connectivity) to avoid allocation. For iterating neighbors is heavily used in A*, the improvement can be significant.

Of course if you’d hard coded the neighbor-iterating code in main loop of A*, you’ll not be bothered by this issue (by losing some flexibility).

Profile
While doing optimizing, it’s fatal to locate the hotspots by profiling instead of guessing. And after each modification we need to reprofile the code to make sure we do improve the performance.

You can use the clock() (in time.h) function to build a simple profiler – just divide the algorithm into several segments, count how many ticks each one cost, and how much they contribute to the overall ticks cost. But as you finetune the bottleneck code and the performance improves, you may reach the point where clock() is not fine-grained enough to reveal the hot spot – clock() report that each part of the code just cost 0 ticks.

Commercial tools like DevPartners’ run-time performance analysis can do a much better work (by instrumenting stub into the code) like thorough percentages of ticks spent in call graph, profile result comparing etc.

Next?
After these micro level techniques applied on A*, we can try some macro level path finding optimizing. Several of them are:

i) limit search spaces
limiting player or npcs’ moving range from whole map to several screens.
ii) cache
recording searched path to avoid duplicate searching.
iii) pre calculate
iv) two-tiered or multi-tiered search
searching on a lower resolution version of a map then refine the result.

Of course these high level strategies will benefit from the A* optimizing we’d done. Check reference for more information.

And that’s it. Hope you might find the strategies discussed here useful for your projects. Questions, comments, etc, please email me.

Reference
[1] Game Programming Gems 1, Section 3 Artificial Intelligence contains a few great articles on A* ranged from basic to advanced topics like aesthetic optimizations and speed optimizations.

[2] GameDev.net: Path finding and Searching, In Justin Heyes-Jones and Patrick Lester’s article, they also give some tips on optimization.

[3] GameAI.com: Pathfinding,

[4] Two-Tiered A* Path finding,

[5] Amit’s Game Programming Page: A*.

No comments: