Scrabble® [1]Back to Contents Prev: [2]Lexicon Representation Next:
[3]Move Selection Strategies
_________________________________________________________________
Move Generation Algorithm
Ideally, at each turn in the game, a player should be aware of every
possible legal move. [4]Appel and Jacobson describe an algorithm that
allows a computer player to achieve this goal; given a particular rack
and board configuration, it efficiently and exhaustively generates all
legal moves from a lexicon represented by a DAWG. This project
implements a slightly modified version of this algorithm; the change
involves the termination condition for finding a legal move, which in
the published version of the Appel and Jacobson algorithm does not
fully conform to the adjacency rules of Scrabble (namely, that all
moves must include at least one point of adjacency).
The algorithm classifies all playable moves in Scrabble as either
across (all played tiles are in the same row) or down (all played
tiles are in the same column). From an algorithmic point of view, it
is sufficient to consider only the problem of generating across moves,
since down moves are simply across moves with the board rotated 90
degrees.
At first glance, the search space for moves is immense. Fortunately,
the rules of the game provide a numerous opportunities for pruning
this space. The algorithm starts with the game's adjacency rules to
quickly establish locations on the board where legal moves can be
made.
Adjacency Constraints: Anchor Squares
Except for the first move, every move in Scrabble must involve placing
at least one tile on an empty square that is horizontally or
vertically (but not diagonally) adjacent to an existing tile. Thus,
the algorithm considers all empty squares that are so adjacent to an
existing tile, and calls them anchor squares, because they represent
squares off which new moves can be made (such moves are said to be
anchored at that square). Furthermore, in order to avoid duplicate
move generation, each anchor is only permitted to generate words where
it is the leftmost anchor in the word (why this is so will soon become
clear).
Back-tracking: Building Prefixes and Suffixes
Valid words built off a particular anchor square can be considered as
having two parts: a left part (or "prefix", in order to maintain
generality with respect to both across and down moves) and a right
part (or "suffix"). The anchor square itself is included in the
suffix. Essentially, the algorithm considers all anchor squares and
for each one, attempts to find valid moves by building prefixes and
extending each prefix with suffixes. It uses a recursive back-tracking
procedure that is directed by the available tiles on the rack, as well
as existing tiles encountered on the board while extending words. The
entire procedure of word generation is guided by the DAWG; at each
step it constrains possible prefixes and suffixes to those that have
the potential to form valid words.
Prefix Generation
In this algorithm, prefixes must either be played completely from the
rack, or come completely from existing tiles on the board. To see why
this is the case, consider the construction of a left part anchored at
anchor square x composed of both tiles from the rack and some existing
tile y on the board. This would require some tile(s) to be placed
adjacent to y. which is actually an anchor square! This is illegal,
since by definition words anchored at x are required to have x as the
leftmost anchor. Therefore, for a particular anchor square, prefixes
can be played:
1. Off the rack, up to the available number of empty consecutive
non-anchor squares.
2. Off the board, by taking the the consecutive letters preceding the
anchor.
Prefixes are generated by starting at the root of the DAWG and
following edges as permitted by available tiles on the rack. Resulting
prefixes have a DAWG node associated with them; the node where the
recognition process stopped. By continuing the word search process
from the sub-DAWG rooted at the node associated with a prefix, the
search space is implicitly pruned to recognized words that begin with
that prefix. This continued search process involves extending each
prefix with suffixes.
Pseudo-code for the prefix generating procedure is as follows
(essentially unchanged from Appel and Jacobson's published version):
prefix( PartialWord, node N in DAWG, limit ) {
extendSuffix( PartialWord, N, AnchorSquare );
if ( limit > 0 ) {
for ( each edge E out of N ) {
if ( letter l labeling E is on rack ) {
remove tile t labeled l from rack;
N' = DAWG node reached by following E;
prefix( PartialWord + l, N', limit-1 );
replace tile t back into rack;
}
}
}
}
Suffix Generation
Note that while an empty string qualifies as a valid prefix, suffixes
must have at least one letter; this is because by definition, the
first letter of a suffix is played on the anchor square, which must be
played on to satisfy Scrabble's adjacency rules. Suffix generation
starts at a sub-DAWG defined by some prefix and following edges as
permitted by available tiles on the rack and tiles encountered on the
board. After each edge is followed, the DAWG node is checked to see if
it is terminal; if so, a valid move has been found (composed of the
suffix appended to the prefix).
Unlike prefixes, suffixes are not constrained to exclude anchor
squares (indeed, they must include at least the anchor square that
anchors the prefix). However, since anchor squares represent points of
adjacency to existing tiles, there is an additional constraint when
playing on one; the "word" formed in the perpendicular direction must
also be valid (i.e. found in the lexicon). This constraint is
expressed in terms of a structure called a cross-check set, which is
pre-computed for each anchor square before move generation. It
specifies what letters form valid perpendicular words on that square,
and can be efficiently represented as a bitset of size 26 (cross-check
computation is simply a matter of searching the DAWG for the existing
word fragment, and enumerating edges leading away from the last node).
Pseudo-code for the prefix generating procedure is as follows (with a
change in the termination condition from Appel and Jacobson's
published version, in order to ensure that the anchor square is played
before a word can be considered as valid):
extendSuffix( PartialWord, node N in DAWG, square ) {
if ( square is empty ) {
if ( N is a terminal node AND AnchorSquare has been played ) {
validMoveFound( PartialWord );
}
for ( each edge E out of N ) {
if ( letter l labeling E is on rack AND
l is in the cross-check set of square ) {
remove tile t labeled l from rack;
N' = DAWG node reached by following E;
NextSquare = square to the right of square;
extendSuffix( PartialWord + l, N', NextSquare );
replace tile t back into rack;
}
}
}
else {
l = letter occupying square;
if ( N has an edge labeled l leading to N' ) {
NextSquare = square to the right of square;
extendSuffix( PartialWord + l, N', NextSquare );
}
}
}
The Bottom Line
Anchor squares represent a required point of adjacency for valid
moves. By taking each anchor square, and building all possible prefix
and suffix combinations off them, the algorithm finds every possible
valid move given a particular board and rack configuration.
Furthermore, each move is found only once, due to the constraint that
a word's anchoring square be the left most anchor in the word. The
algorithm is efficient because the backtracking strategy is guided by
the DAWG, which means every tile played has the potential to form a
valid word somewhere on the board, and tile movement overhead between
the rack and board is kept minimal by building words left to right.
Score Computation
Because the algorithm generates moves incrementally and recursively,
it is appropriate that score computation be done in a similar fashion.
Appel and Jacobson do not discuss score computation save to suggest
the computation of so-called cross-sums representing the sum of values
of all tiles in a continguous sequence above and below each anchor, as
a method of accounting for the score contribution of perpendicular
words.
I have chosen to compute scores for partial words at each iteration of
move generation, passing the computed value down to successive calls.
While premium letter squares are easily handled in this manner (since
they apply only to the letter being processed in that iteration),
premium word squares are slightly more involved, since they need to be
applied to the whole word, which is not known at the time the square
is encountered. My solution is to pass a word multiplier factor by
reference through the recursive calls; each time a premium word square
is encountered, the factor is increased appropriately, and when
validMoveFound() is called, it applies the factor.
_________________________________________________________________
[5]Back to Contents Prev: [6]Lexicon Representation Next: [7]Move
Selection Strategies
References
1. http://www.ece.uwaterloo.ca/~k2tam/Scrabble/contents.html
2. http://www.ece.uwaterloo.ca/~k2tam/Scrabble/lex.html
3. http://www.ece.uwaterloo.ca/~k2tam/Scrabble/movesel.html
4. http://www.ece.uwaterloo.ca/~k2tam/Scrabble/refs.html#Appel
5. http://www.ece.uwaterloo.ca/~k2tam/Scrabble/contents.html
6. http://www.ece.uwaterloo.ca/~k2tam/Scrabble/lex.html
7. http://www.ece.uwaterloo.ca/~k2tam/Scrabble/movesel.html