So after a small diversion to give a bit of UI, it was back to the original proposition - that it should be possible to perform useful shape matching rapidly enough to not drag the time to perform a game tree node analysis into the mire.
The key insight...well, maybe 'idea' is a better word - 'insight' is perhaps a bit presumptive...was that we can easily represent a board position as a bitmap, and patterns similarly, and sweep one across the other with logical rotations which boil down to a very small number of machine ops.
To illustrate, consider the slightly simpler problem whereby we just want to match patterns of occupation of the board, without regard to the colour of stone on each point. In that case each point on the board just needs a single bit to encode it (occupied or not) and we represent a row of the board as the first 19 bits (or whatever board size we're using) of a convenient word (32 bits say). Similarly each row of the pattern is represented by the first N-bits of a word (where N is the pattern width), together with a mask word which masks out parts of the pattern's bounding rectangle we don't care about. To perform a search for any given pattern sweep the first row of the pattern across the first row of the board by XORing them together (and ANDing out the mask) at each bit position (bit rotation of the pattern-row-encoding word sweeps the pattern across the board columns). On finding a match in pattern row 1 verify the next pattern row in the same positon against the next board row and so on. After finding or failing to find a match alligning the first row of the pattern to the first row of the board move down the board repeating the above procedure until the entire board is scanned.
The kernal of this loop is VERY efficient in terms of machine instructions required, and obviously the range of positions that need be checked is constrained by the size of the pattern and of the board (no point trying to match a 4-wide pattern within 3 intersections of the right hand edge and so on). It also exhibits highly localised data access patterns which are very cache-friendly.
In practise the matching problem for Go isn't just occupancy of course - board intersections may be occupied by either white or black stones, and similarly pattern points. However, that doesn't really change anything - we just need 2 bits per posiiton (I use the high and low order 32-bits of a 64-bit word, which these days is the CPU native size on most machines). In fact the 4th value available also turns out to be very handy - we will often be interested in matching patterns that relate to edges or corners of the board, and we can easily accomodate this by making our board-encoding array 2 virtual intersections larger than the actual board and storing the otherwise unused 4th value in the off-the-board 'edges', allowing patterns to match against them also (effectively pinning the pattern to the edge/corner).
So enough of the theory. How does it work out in practise?
I started off with a small shape library containing 4 common shapes that we'll need as part of any complete library - one-point jumps, bamboo-joints, diagonal connections, and what I chose to call L-joints. These are illustrated below: