Writing a Go Program: Under the Influence

A common concept in Go is influence. This is hard to formally define, but essentially a collection of stones tend to make the area of the board near them more likely to end up under the control of the owner of those stones. This may mean it eventually turns into territory for that player, or just that that player will have tactical advantage in any fights that take place in that part of the board.

Take this position for example:

In the lower left corner black has secured a small amount of territory in return for which white's wall of 3 stones gives him considerable influence on the bottom part of the board. This influence is not the same as territory, but it means that white will have considerable advantage in fights in this area. Collections of stones with strong influence are often referred to as being 'thick'. The white shape below is an example of thickness, which generates influence to the right:

Evaluating influence is useful in both evaluating a position (who has the greater control) and suggesting moves to play (often playing near influence boundaries will provide good ways to grow one's own area and reduce the opponent's).

Zobrist's original Go playing program included an influence generation algorithm. Zobrist's algorithm worked by considering each stone as an independent source of influence and iteratively 'radiating' from it by (at each generation) adding 1 to the influence value at each vacant neighbouring intersection of an intersection that had influence assigned by the previous generation (seeding the stone itself as being the 0th generation's result point). Zobrist's program employed 4 such generations (so the maximum 'range' of any stone's influence was a Manhattan distance of 4 from it). It summed the contributions from every stone on the board to arrive at a final influence map.

We can visualise an influence map by shading parts of the board (darker for black, lighter for white), as in the following picture (not using the Zobrist algorithm, but just to illustrate the idea):

A major problem with Zobrist's algorithm is that it's expensive to compute (especially if you try to extend its range by adding more generations), so computing it for every node you want to evaluate in the game tree is really going to slow things down. What I figured I needed was an iteratively computable influence map - i.e. - one in which computing the influence map for a position which results from adding one stone to a position, for which we have already calculated the influence map, is cheap. What I discovered was that this was harder than it sounds!

I spent a couple of days working on what ultimately turned into dead ends, trying to adjust Zobrist's algorithm to make it iteratively calculable, by maintaining 'flow' values of influence contributions arriving at each intersection from each neighbour. The idea was that when a stone is played the 'flows' into that intersection can be used to generate their downstream results and those results subtracted from the existing state before adding in the contribution from the new stone. Although I was able to get this working functionally (captures are a pain to handle though) it was still too slow to really be palatable (hundreds of micro seconds on average).

Eventually I started working on another approach, somewhat analogous to ray-tracing in graphical rendering. The idea was that if an intersection had a 'line of sight' to a stone then it should receive an influence contribution from that stone. In this context 'line of sight' really means a contiguous path through vacant intersections to that stone. For example the path below from the white stone to the terminal point:

Provided we restrict the paths to those which always move closer to their end point (so no doubling back on itself) what emerges is 'shadows' cast by stones, so for example, the lower white stone above has no path to the intersection to the immediate left of the black stone (the upper white stone does however). Notionally the algorithm considers the set of paths from any source point (a played stone) to a given vacant intersection, and assigns to that path a contribution of:

C = f(p1,p2)/N(p1,p2)

where p1 and p2 are the coordinates of the two points (the source point and the terminus of the path), f is a function determining overall contribution based on the distance from p1 to p2, and N is a function returning the number of paths between p1 and p2. Essentially this amortises the distance-based influence contribution between all the possible paths. This normalises things for differing path counts between points which arise from the fact that we are only considering paths which always move toward their destination (this means there are more paths between points 1 and 3 below than between 1 and 2 (3 paths, vs 1 path), so without this normalisation we'd see oddly distorted influence at 3 relative to 2)

The influence at a point is then the sum of all contributions from paths which are not blocked by non-vacant intersections. Using an exponential decay based on Manhattan distance for f seems to give reasonable results.

So far this all sounds very well, but how do we calculate it iteratively and with high performance?

The key observation is that the paths themselves are features of the basic board geometry, not of the position occupying the board. That is if we consider the complete set of all paths between all pairs of points on the board, all that changes as stones are played (or removed) is whether the path is actually extant in terms of making a contribution (i.e. - if it's origin point has a stone on it, and no other point on it does). Consequently we can (in principal) pre-calculate all paths and simply toggle them on and off as the position changes.

In practise there are a (very) large number of paths between all possible pairs of points on the board (there are 361 X 360 / 2 (over 360,000) pairs of points before we even start considering the multiple paths between them!) To reduce this to a manageable number we can do two things:

Only consider paths between points within a certain distance of one another (which will give our influence a maximum range, but remember we already had that issue with the Zobrist algorithm anyway so this is no worse in that regard)
Only consider 'reasonable' paths between pairs of points.

To illustrate what I mean by (2) look at the two paths below:

The yellow path looks like a not very reasonable way for influence to flow, whereas the red path seems more natural. Basically if we restrict max path length to 5, and path construction between two points operates in a 'greedy' fashion (always take the choice for the next point as the one that reduces the euclidean distance to the destination the most, which in a discrete grid will give at most two choices), then generating paths in both directions (to ensure a symmetric result) gives us a grand total of 143296 paths (that is the number debugging stats report so don't ask me to derive it mathematically!). This may sound like a lot, but it moves the bulk of the computation up front - playing any move requires (very inexpensive) processing of at most 188 paths. The memory cost is more substantial (a little over 8M), but on a modern PC is not really a big deal.

For each path I maintain:

Source coordinate
Terminus coordinate
Contribution
Current state of the source point (black, white, empty)
Count of other points occupied

For each point of the board I maintain a list of paths (references not instances) that:

Originate at that point
Terminate at that point
Pass through that point

I also maintain the current influence score for each point of the board.

When a new stone is played all we have to do is:

For each path originating at the played point set its origin state to the played stone, and if it has a 0 occupancy count, add the path's contribution (sign reversed for white) to the path's terminal point's current influence
For each path passing through the played point, increment its occupancy count, and if it previously had a 0 occupancy count and a stone at its origin, subtract its contribution from its terminus
For each path terminating in the point increment its occupancy count

Removing a stone is similar. The code is very simple. Here is what the routine to place a stone looks like (C#):

        public void PlaceStone(BoardCoordinate point, bool isBlack)
        {
            sourced[point] = true;
            values[point] = 0;

            //  All extant paths from this source provide influence
            bool signCalculated = false;
            bool reverseSign = false;

            foreach (Path path in sourcedFrom[point].paths)
            {
                if (!signCalculated)
                {
                    signCalculated = true;
                    reverseSign = ((path.contribution > 0) != isBlack);
                }

                if (reverseSign)
                {
                    path.contribution = -path.contribution;
                }

                path.sourced = true;

                if (path.occupationCount == 0)
                {
                    values[path.terminus] += path.contribution;
                }
            }

            //  Paths thru this point are no longer extant
            foreach (Path path in passThrus[point].paths)
            {
                if (path.occupationCount++ == 0 && path.sourced)
                {
                    values[path.terminus] -= path.contribution;
                }
            }

            //  Paths terminating here can now be marked occupied
            foreach (Path path in terminateIn[point].paths)
            {
                path.occupationCount++;
            }
        }

The end result is that maintaining the influence map for a 19 X 19 board costs me about 15 micro-seconds per move (amortised over 100 games of 250 random moves each). This is price I am prepared to pay. It also seems to give results that look reasonable, at least to my eye (the shaded map earlier on this page was in fact produced by this code). Furthermore it has the advantage that the path set can be tweaked arbitrarily without affecting the runtime code (just the path generation code), so (for example) I might choose to add a small number of longer paths to extend the influence range, and can do so without algorithmic changes, or dramatic runtime cost changes (provided I can choose a reasonably small subset of the set of possible paths

Writing a Go Program

Sunday, February 27, 2011

Under the Influence

No comments:

Post a Comment