Scrabble® [1]Back to Contents Prev: [2]Lexicon Representation Next: [3]Move Selection Strategies _________________________________________________________________ Move Generation Algorithm Ideally, at each turn in the game, a player should be aware of every possible legal move. [4]Appel and Jacobson describe an algorithm that allows a computer player to achieve this goal; given a particular rack and board configuration, it efficiently and exhaustively generates all legal moves from a lexicon represented by a DAWG. This project implements a slightly modified version of this algorithm; the change involves the termination condition for finding a legal move, which in the published version of the Appel and Jacobson algorithm does not fully conform to the adjacency rules of Scrabble (namely, that all moves must include at least one point of adjacency). The algorithm classifies all playable moves in Scrabble as either across (all played tiles are in the same row) or down (all played tiles are in the same column). From an algorithmic point of view, it is sufficient to consider only the problem of generating across moves, since down moves are simply across moves with the board rotated 90 degrees. At first glance, the search space for moves is immense. Fortunately, the rules of the game provide a numerous opportunities for pruning this space. The algorithm starts with the game's adjacency rules to quickly establish locations on the board where legal moves can be made. Adjacency Constraints: Anchor Squares Except for the first move, every move in Scrabble must involve placing at least one tile on an empty square that is horizontally or vertically (but not diagonally) adjacent to an existing tile. Thus, the algorithm considers all empty squares that are so adjacent to an existing tile, and calls them anchor squares, because they represent squares off which new moves can be made (such moves are said to be anchored at that square). Furthermore, in order to avoid duplicate move generation, each anchor is only permitted to generate words where it is the leftmost anchor in the word (why this is so will soon become clear). Back-tracking: Building Prefixes and Suffixes Valid words built off a particular anchor square can be considered as having two parts: a left part (or "prefix", in order to maintain generality with respect to both across and down moves) and a right part (or "suffix"). The anchor square itself is included in the suffix. Essentially, the algorithm considers all anchor squares and for each one, attempts to find valid moves by building prefixes and extending each prefix with suffixes. It uses a recursive back-tracking procedure that is directed by the available tiles on the rack, as well as existing tiles encountered on the board while extending words. The entire procedure of word generation is guided by the DAWG; at each step it constrains possible prefixes and suffixes to those that have the potential to form valid words. Prefix Generation In this algorithm, prefixes must either be played completely from the rack, or come completely from existing tiles on the board. To see why this is the case, consider the construction of a left part anchored at anchor square x composed of both tiles from the rack and some existing tile y on the board. This would require some tile(s) to be placed adjacent to y. which is actually an anchor square! This is illegal, since by definition words anchored at x are required to have x as the leftmost anchor. Therefore, for a particular anchor square, prefixes can be played: 1. Off the rack, up to the available number of empty consecutive non-anchor squares. 2. Off the board, by taking the the consecutive letters preceding the anchor. Prefixes are generated by starting at the root of the DAWG and following edges as permitted by available tiles on the rack. Resulting prefixes have a DAWG node associated with them; the node where the recognition process stopped. By continuing the word search process from the sub-DAWG rooted at the node associated with a prefix, the search space is implicitly pruned to recognized words that begin with that prefix. This continued search process involves extending each prefix with suffixes. Pseudo-code for the prefix generating procedure is as follows (essentially unchanged from Appel and Jacobson's published version): prefix( PartialWord, node N in DAWG, limit ) { extendSuffix( PartialWord, N, AnchorSquare ); if ( limit > 0 ) { for ( each edge E out of N ) { if ( letter l labeling E is on rack ) { remove tile t labeled l from rack; N' = DAWG node reached by following E; prefix( PartialWord + l, N', limit-1 ); replace tile t back into rack; } } } } Suffix Generation Note that while an empty string qualifies as a valid prefix, suffixes must have at least one letter; this is because by definition, the first letter of a suffix is played on the anchor square, which must be played on to satisfy Scrabble's adjacency rules. Suffix generation starts at a sub-DAWG defined by some prefix and following edges as permitted by available tiles on the rack and tiles encountered on the board. After each edge is followed, the DAWG node is checked to see if it is terminal; if so, a valid move has been found (composed of the suffix appended to the prefix). Unlike prefixes, suffixes are not constrained to exclude anchor squares (indeed, they must include at least the anchor square that anchors the prefix). However, since anchor squares represent points of adjacency to existing tiles, there is an additional constraint when playing on one; the "word" formed in the perpendicular direction must also be valid (i.e. found in the lexicon). This constraint is expressed in terms of a structure called a cross-check set, which is pre-computed for each anchor square before move generation. It specifies what letters form valid perpendicular words on that square, and can be efficiently represented as a bitset of size 26 (cross-check computation is simply a matter of searching the DAWG for the existing word fragment, and enumerating edges leading away from the last node). Pseudo-code for the prefix generating procedure is as follows (with a change in the termination condition from Appel and Jacobson's published version, in order to ensure that the anchor square is played before a word can be considered as valid): extendSuffix( PartialWord, node N in DAWG, square ) { if ( square is empty ) { if ( N is a terminal node AND AnchorSquare has been played ) { validMoveFound( PartialWord ); } for ( each edge E out of N ) { if ( letter l labeling E is on rack AND l is in the cross-check set of square ) { remove tile t labeled l from rack; N' = DAWG node reached by following E; NextSquare = square to the right of square; extendSuffix( PartialWord + l, N', NextSquare ); replace tile t back into rack; } } } else { l = letter occupying square; if ( N has an edge labeled l leading to N' ) { NextSquare = square to the right of square; extendSuffix( PartialWord + l, N', NextSquare ); } } } The Bottom Line Anchor squares represent a required point of adjacency for valid moves. By taking each anchor square, and building all possible prefix and suffix combinations off them, the algorithm finds every possible valid move given a particular board and rack configuration. Furthermore, each move is found only once, due to the constraint that a word's anchoring square be the left most anchor in the word. The algorithm is efficient because the backtracking strategy is guided by the DAWG, which means every tile played has the potential to form a valid word somewhere on the board, and tile movement overhead between the rack and board is kept minimal by building words left to right. Score Computation Because the algorithm generates moves incrementally and recursively, it is appropriate that score computation be done in a similar fashion. Appel and Jacobson do not discuss score computation save to suggest the computation of so-called cross-sums representing the sum of values of all tiles in a continguous sequence above and below each anchor, as a method of accounting for the score contribution of perpendicular words. I have chosen to compute scores for partial words at each iteration of move generation, passing the computed value down to successive calls. While premium letter squares are easily handled in this manner (since they apply only to the letter being processed in that iteration), premium word squares are slightly more involved, since they need to be applied to the whole word, which is not known at the time the square is encountered. My solution is to pass a word multiplier factor by reference through the recursive calls; each time a premium word square is encountered, the factor is increased appropriately, and when validMoveFound() is called, it applies the factor. _________________________________________________________________ [5]Back to Contents Prev: [6]Lexicon Representation Next: [7]Move Selection Strategies References 1. http://www.ece.uwaterloo.ca/~k2tam/Scrabble/contents.html 2. http://www.ece.uwaterloo.ca/~k2tam/Scrabble/lex.html 3. http://www.ece.uwaterloo.ca/~k2tam/Scrabble/movesel.html 4. http://www.ece.uwaterloo.ca/~k2tam/Scrabble/refs.html#Appel 5. http://www.ece.uwaterloo.ca/~k2tam/Scrabble/contents.html 6. http://www.ece.uwaterloo.ca/~k2tam/Scrabble/lex.html 7. http://www.ece.uwaterloo.ca/~k2tam/Scrabble/movesel.html