The key insight...well, maybe 'idea' is a better word - 'insight' is perhaps a bit presumptive...was that we can easily represent a board position as a bitmap, and patterns similarly, and sweep one across the other with logical rotations which boil down to a very small number of machine ops.

To illustrate, consider the slightly simpler problem whereby we just want to match patterns of occupation of the board, without regard to the colour of stone on each point. In that case each point on the board just needs a single bit to encode it (occupied or not) and we represent a row of the board as the first 19 bits (or whatever board size we're using) of a convenient word (32 bits say). Similarly each row of the pattern is represented by the first N-bits of a word (where N is the pattern width), together with a mask word which masks out parts of the pattern's bounding rectangle we don't care about. To perform a search for any given pattern sweep the first row of the pattern across the first row of the board by XORing them together (and ANDing out the mask) at each bit position (bit rotation of the pattern-row-encoding word sweeps the pattern across the board columns). On finding a match in pattern row 1 verify the next pattern row in the same positon against the next board row and so on. After finding or failing to find a match alligning the first row of the pattern to the first row of the board move down the board repeating the above procedure until the entire board is scanned.

The kernal of this loop is VERY efficient in terms of machine instructions required, and obviously the range of positions that need be checked is constrained by the size of the pattern and of the board (no point trying to match a 4-wide pattern within 3 intersections of the right hand edge and so on). It also exhibits highly localised data access patterns which are very cache-friendly.

In practise the matching problem for Go isn't just occupancy of course - board intersections may be occupied by either white or black stones, and similarly pattern points. However, that doesn't really change anything - we just need 2 bits per posiiton (I use the high and low order 32-bits of a 64-bit word, which these days is the CPU native size on most machines). In fact the 4th value available also turns out to be very handy - we will often be interested in matching patterns that relate to edges or corners of the board, and we can easily accomodate this by making our board-encoding array 2 virtual intersections larger than the actual board and storing the otherwise unused 4th value in the off-the-board 'edges', allowing patterns to match against them also (effectively pinning the pattern to the edge/corner).

So enough of the theory. How does it work out in practise?

I started off with a small shape library containing 4 common shapes that we'll need as part of any complete library - one-point jumps, bamboo-joints, diagonal connections, and what I chose to call L-joints. These are illustrated below:

Left to right: Bambo joint, L-joint, 1-point jump, Diagonal connection.

In each case the highlighted area is the set of intersections to be included in the pattern (so note that the L-joint, for example, is non-rectangular because we don't care about any stones that may or may not be present in two of the intersections).

The code that loads the pattern library automatically generates all symmetric partners (rotations, reflections, and colour-swaps) which gives a total of 28 raw pattern bitmaps for the above 4 canonical shapes. My initial implementation matched the full pattern library (28 shapes) against a full board position (19X19 so order of 300 or so possible pattern placements for each pattern) in a shade over 20 micro-seconds.

Considering this is single-threaded, implemented inside a virtual runtime (.NET) on a 2-year old desktop that seemed fast enough to be encouraging.

In a real game-tree analysis however, there is scope for significant optimisation of position anlysis (of which the pattern matching is ultimately a sub-component). Specifically as we traverse the tree, the analysis for a child node only differs from that of its parent by the effects of playing one move. For Go that means adding one stone most of the time (captures complicate things but are relatively rare). Hence if we can make our positional analysis routines operate on some result-state iteratively, then they only have to consider elements that have changed. For shape-matching this means that playing a stone at some point can only possibly influence pattern placements that overap that point.

This is very easy to take advantage of - instead of doing full-board pattern matches every move, build up a map of pattern-match instances (2-tuples of pattern and matched position) which persists between moves. When a move is played, invalidate all instances overlapping the played point, and then scan the pattern library over the subset of the board that overlaps (this is all very analogous to clipping rectangles in graphics).

Doing that got me down to a bit under 15 micro seconds to update the shape map for each move played (based on amortised mesurement over 100 random games of 250 moves each).

Now what are we going to do with our new-found ability to match shapes?

...a tale for another day...

## No comments:

## Post a Comment