Entry Summaries
                                       
   The attached mail is the collection of write-ups I received. As usual
   they are mostly unedited (only personal POTM-master insults were
   removed). Write-ups are provided in order of finish (more or less).
   
_________________________________________________________
            SplitMoo    24 (  7/  9/  8)   gC Frederic vanderPlancke
_________________________________________________________
=================
SplitMoo submitted by Frederic van der Plancke at fvdp@decis.be
=================
++ Please summarize the algorithm used by SplitMoo

The first two guesses are hardcoded; they were computed by a variant
of Mad_Cow_Cheating (a DICTIONARY ENTRY - see below).

For the remaining guesses:
for 4 seconds SplitMoo tries to find all solutions (words that are still
possible according to the outcome of the previous guesses). If it does not
find all solutions, for the next 4 seconds it tries to come out with a
random sample of the solution space. (Ideally this sample should be unbiased,
but in my program it is not, because I think this would be too expensive).
Each solution is attributed an "englishness" representing the likeliness
of it being an english word, based on the probability of each sequence of
consecutive two letters in my dictionary.

Then each solution found is evaluated against the solution sample (part of
the computation is done during the search.) The heuristic aims at minimizing
the expectation of the number of guesses needed to find the word, by
examining how each candidate splits the sample (hence the name "SplitMoo").

For the last 2 seconds, it searches for an improved guess by examining
mutations of the previous candidates. This part plays a really important
role ! Doubling its running time has probably been one key to my success.
(It's important in words like "STAT" (doesn't typecheck but I did test it anywa
y):
my program quickly founds it must be "S?A?", but then there remains many
possibilities for the two remaining letters; if it did check only for guesses
of the form "S?A?" it would lose time. Of course "englishness" helps as well
in this case.)

--- in more detail

The most involved part is the search for solutions.
I wrote a (cut-down) constraint propagation engine for that purpose...
Not spending too much source code for it was an interesting challenge !

My program essentially maintains the following information during the whole
search for a word:
- set of letters that are possibly/surely present in the solution
- for each i, set of possible/sure letters at position i
- for each guess, set of positions for which it is possible/sure we have a hit
  (this is redundant with the previous sets actually)
- "linear constraints", based on the number of misses for each guess
and reduces the sets according to the previous guesses outcome
(before searching for any solution) and the previously instantiated letters
(during the search). It propagates simple deductions to do so.
("simple" does not really mean "trivial" though, as witnessed by
my classes CharLinCtr or LetterUnionCtr !)

The search uses heuristics so that it first places letters that are needed,
according to the constraints: letters that are surely in the solution,
but not yet known to be at any position; as a second choice,
other letters that can help reaching the number of non-misses in a guess.

++ Did SplitMoo use any letter frequency tables?  Where did you get them?

SplitMoo uses a table of likeliness of all sequences of two letters
(including spaces after or before the word, to take into account the
likeliness of the initial and final letters of the word.)
They were extracted from my dictionary.
Where I got this dictionary is explained in the "Mad_Cow_Cheating"
answers.

++ If I had it to do all over - here's what I would try ....

Most importantly, I should share my time more evenly among the different
parts of my design, and keep much more time for developping the
guess generation algorithm and for overall tuning.
Two important parts that make my program as good as it is were added in the
last 24 hours: englishness measure, and the search for more-or-less random
alternative guesses.
I did take almost no time for real tuning (although I tested PINEAPPLE with
every version of my program). It is possible that single tuning decisions
could improve my results even more. I didn't even try spending more than
2 seconds in the final phase, even though moving from 1 sec. to 2 sec.
improved my results a lot ! The deadline was close at the time
and I was getting tired with a wish to finish... (I did not realize I
had _that_ much of a chance to win !)

Other remarks:
* Perhaps is the constraint propagation engine overkill. Perhaps not. It surely
helped about maintainability.
I could perhaps have tried to be less brute-force-ish in my approach to finding
solutions.
* I should develop the last two seconds of alternative guesses search;
maybe by introducing a more sophisticated genetic algorithm
(like the one in Mad_Cow_Cheating).

Two weeks before the deadline I tried a purely genetic approach. After much
refinement, it was able to find PINEAPPLE in 14 to... 36 guesses, so I
gave up the idea... It was based on SplitMoo's design but the task of
figuring out a solution sample was handled by a genetic program.
The problem then is to keep this sample even and not concentrated around one
word or family of words; specially when no word matching all constraints
is known.


++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS?

I live and work in Brussels, Belgium.
My main work project is a program for automated staff planning.

For fun: drawing, swimming, writing programs (my next big "home" project is a
MineSweeper solver, for which I have good ideas), reading/doing some maths...

For my innermost secret, have a look at the "Mad_Cow_Cheating"
answers... ;-)
[Avec un clin d'oeil a Caroline N, salut si par hasard tu me lis !]

POTM ideas:
Solving MineSweeper (that game that ships with Windows) or a variant
thereof (like allowing the player to continue the game after (s)he has
hit a bomb)... I can stay off-contest if you think it's fairer... (but
I'd prefer to stay in !)

_________________________________________________________
     SchrodingersCow    25 (  7/  8/ 10)   gC William Little
_________________________________________________________
=================
SchrodingersCow submitted by William Little
=================
++ Please summarize the algorithm used by SchrodingersCow

The algorithm has 3 phases...

PHASE 1:
Generate a population, the members of which each match all of the
information gathered so far about the target word.  These members are
generated by starting with a random sequence of letters, and then applying
mutations.  The mutant is then compared to the original, and replaces it
unless it gets a higher (worse) score from the fitness function.  This
process continues for 2000 cycles, or until the score reaches zero.  If
zero is successfully reached, then the word is satisfactory and is added
to the population; otherwise it is discarded to avoid getting trapped in a
local minimum.

The fitness function determines the candidate's score by calculating the
number of hits and misses that would have been returned for all the
guesses made thus far if the target word were the candidate.  It totals
the absolute deviation from the actual values, and unless desperation has
kicked in, it also adds one for each letter triplet in the candidate which
is not listed as viable in the triplets table.

PHASE 2:
If the size of the population is zero, then turn on desperation, and make
a random guess to buy time.  Otherwise, the population is now a sample of
words which might be the target word.  One of them might be the best
guess, so, for each member, calculate how its use as a guess would divide
the whole population, and the size of the largest chunk.  Choose the
member that would result in the smallest largest chunk as the starting
point for phase 3.

PHASE 3:
The result of phase 2 is a good guess, chosen from among a population of
possible target words, but the best guess need not be a possible target
word.  Therefore, phase 3 uses any remaining time until the 9 second mark
to apply mutations to the guess, much as in phase 1, with the hopes of
reducing the largest chunk size even further, with the difference that
draws go to the original, not the mutant.  Once time is up, the guess is
made, and the program terminates.

Those are the three phases of the program.  As for the triplet table, it
has one entry for each sequence of 3 letters, from AAA to ZZZ.  Each entry
has 3 relevant bits:  one determines whether a triplet occurs at the
beginning of any known word, one does the same for occurences at the ends
of words, and the other one for occurences anywhere other than the
beginning or end.

++ Did SchrodingersCow use any letter frequency tables?  Where did you get
them?

No, it doesn't care a bit about letter frequency.  All it cares about is
whether a given letter triplet is viable.  As for the triplet table, I
wrote a separate program to generate that.

++ If I had it to do all over - here's what I would try ....

...fewer bad ideas and algorithms, before deciding on this one.

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM
IDEAS?

WHERE I LIVE: A quaint post-industrial globe, the native inhabitants of
which are still largely convinced that they live in a Euclidean
3-space...Ha-ha-ha!

WHAT I DO FOR FUN: Write programs, exploit technology, search for the
perfect flashlight, and try to catch that blasted roadrunner.

WHAT I DO FOR WORK: Try to keep everything from falling apart.

INNERMOST SECRET: This extremely mysterious person's utmost secret?  It's
something...most entertaining.

POTM IDEAS: Large cash prizes!


_________________________________________________________
           MooIndigo    27 (  7/  9/ 11)    c Scott Kennedy
_________________________________________________________
===============================================
MooIndigo submitted by Scott Kennedy at tsk@net2phone.com
===============================================
+-+- Please summarize the algorithm used by MooIndigo

MooIndigo generates each possible letter combination and
compares it against each previous guess.  If the comparison
generates the same number of COWS and BULLS, then
it is printed as the next guess.

The letters chosen by order of frequency in commonly
occurring words.  The first set of guesses are selected
to use the letters of the alphabet with no repeated letters.
For example,  if the word length is 6 letters, the first
4 guesses will contain different letters. The fifth guess
will include the 2 unused letters and possible letters
from previous guesses (based on number of COWS).

The remaining guesses are generated by trying combinations
with no repeated letters, 2 repeated letters, and finally
any number of repeated letters.

Two passes are made.  On the first pass, letter combination
frequency tables are used to reject bad letter combinations
(e.g. all consonants, 'Q' but no 'U',  words beginning with
'ZG', etc.).  On the second pass, all combinations are
tested (just in case the POTM-MASTER found an obscure word).

The passes through the alphabet are made through a set of
nested loops.  The current indexes of the loops are saved
in the temporary file so that each time the program
starts up, it can reset the loop counters and pick up
where it left off.

To fully utilize the 10 seconds allotted for each pass,
MooIndigo looks ahead for other possible guesses.  Using the
letter frequency tables, it determines a score for each
guess and if time expires, it prints the guess with the highest score.

If the 10 seconds expires and MooIndigo has not found a
new guess, it prints the current combination that it is evaluating.

MooIndigo generates a list of currently available letters
so that it can skip letters that could not possibly be in the word.
For example, if a guess generates 0 COWS and 0 BULLS,
none of the letters in the word are used.  Similarly, knowing
which letters are not used, it can also determine which
letters must be used.

+-+- Did MooIndigo use any letter frequency tables?  Where did you get them?

Yes.  The UNIX 'spell' dictionary was used to generate the
following letter frequencies.

1) The letter frequency for each letter based on word length:

        AEOLRITNSUDMPHCBKGFWYVJZXQ,   / 4 /
        EAROLSTINCUYHDPMBGKFWVZXJQ,   / 5 /
        EARNITLOSCUDMHPYBGFKWVXZJQ,   / 6 /
        EARINTOLSCUMDPHGBYFWKVXZJQ,   / 7 /
        EAIRONTLSCDUMHPBGYFKWVXQJZ,   / 8 /
        EAITRONSLCUDMPHGBYFWVKXQJZ,   / 9 /
        EIAOTRNSLCUPMHDGBYVFWKXQJZ,   / 10 /

2) Frequency of 2-letter combinations at beginning of a word

3) Frequency of 2-letter combinations at end of a word

4) Frequency of 2-letter combinations in the middle of a word

5) Frequency of N consonant in a row.

6) A list of three letter combinations that do no appear in words.


+-+- If I had it to do all over - here's what I would try ....

I would have like to do additional work on the  scoring mechanism
and some performance improvements to search more gueses within
the 10-second allotment.  I ran the program against a set of
about 4000 words to make sure and changes made to the scoring
and alogoritms actually reduced the average number of guesses.

Also,  trying different approaches to determine which letters
are repeated might reduce the number of guesses.

+-+- WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS?

By the C, by the C, by the beautiful C.


_________________________________________________________
     dancing_letters    27 (  7/ 10/ 10)   gC Pavel Ouvarov
_________________________________________________________
=================
dancing_letters submitted by Pavel Ouvarov at uvarov@sirius.ihep.su
=================
++ Please summarize the algorithm used by dancing_letters
  The algo:
  After the first guess build the Candidate Word Space (CWS).
Choose a word from CWS, try it and reduce the CWS. Make it repeatedly.

  There are two problems:
  First, how to organize CWS? I invented a class that I called 'mask', which
can contain a great number of words. It is easily constructed and can be
easily overlapped with another 'mask'. Thus, CWS is a (relatively small) list
of 'mask's.
  Second, which word from CWS to choose? I decided to choose the best word,
based on statistics. CWS may be very large and often there isn't enough time
to look through all the words. So I choose only best 'mask's (based on
statistics) and splits them into words. Then I choose the best word.
  (For detailes ask me or look through the code)

++ Did dancing_letters use any letter frequency tables?  Where did you get them
?
  Yes, it uses three levels of statistics. (They were all taken from my linux
spell dictionary).
  First, simple letter frequency table for each place in the word.
  Second, letter pair frequency table for each place in the word.
  Third, letter triple allowance. It helps to find out if the letter triple
may be used or not without regard to place (for example AAA is not allowed).

++ If I had it to do all over - here's what I would try ....
  I'd follow better programming style...
  Perhaps I'd try to find the word that possibly reduces the CWS best.

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS?
  I leave in Russia in the best town of the world - Protvino. Now I'm the
second year student of the physical faculty in the Moscow State University.
  DO FOR FUN? Friends, girls, beer...
  DO FOR WORK? Physics...
  INNERMOST SECRET? Tsss... I think that Eugeniya Bagdasarova is jealous
for you... Therefore she's the most pretty and sexual girl in the universe...

_________________________________________________________
               Skunk    28 (  8/  8/ 12)   gC Alexey Zhelvis
_________________________________________________________
        Sorry ... No questionairre was returned for Skunk

_________________________________________________________
          StupidComp    29 (  9/  8/ 12)   gC Trieu Nguyen
_________________________________________________________
=================
StupidComp submitted by Trieu Nguyen at trieu@ix.netcom.com
=================
++ Please summarize the algorithm used by StupidComp

In my program, I define a "English word" is strings of consonant, and vowel
next to each other.
For example: E-ngl-i-sh
There is a array hold all strings of consonant, and vowel. At first the
program will try some words, after that it will find the match one in all
English words.

++ Did StupidComp use any letter frequency tables?  Where did you get
them?
The array of consonant and vowel is generated from an English dictionary on
the web.

++ If I had it to do all over - here's what I would try ....
  Dont dare to do it again :-)

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM
IDEAS?
Berkeley, California. Working at Lawrence Lab.Interested in computer chess.
I wrote a small Chinese chess playing program, and let it play in the
internet. It beats many people in fast game heheh.


_________________________________________________________
        QuackTheDuck    30 (  5/ 17/  8)    c Wolfram Hinderer
_________________________________________________________
 =================
 QuackTheDuck submitted by Wolfram Hinderer
        at Wolfram.Hinderer@math.unikarlsruhe.de
 =================
 ++ Please summarize the algorithm used by QuackTheDuck

For my first try I used Pascal. I transformed the answers into linear
equations and simplified them. The problem then was to find the guesses which
would give the most information. I was not very successful with this
(often 20 guesses or more), and I didn't know how I should use
Information about English.

So I started again from scratch, using C. Well, that should probably be
_abusing_ C. I have not so much experience programming in C, but I had to
use it (no Pascal support, and it's faaaaster).
If you have a look at my source, you will see that it is just
terrible. The recursion (see below) is the only
thing where I tried to write smart code: time is valuable!

But now the algorithm:
I use lots of data of the English language. I use triples,
like "ING". For every combination of two letters
("AA", "AB", ..., "ZY", "ZZ") there is a rank list of the
letters that could possibly follow (like the "probability" that a
certain letter is the next letter in the word).
For every position (first three letters in word,
letters 2,3,4 in word etc.) I have seperate lists. (I had
different lists for different word lengths at the beginning, with
significantly better results, but the size limit... I had to
compress my data anyway!) The ranks for "AD" at the start of
a word would be "VDJMOAHUERSILK", for example.
(I have lists for the first letter, and for the second letter
depending on the first letter of a word.)

My guess would be the first word found that is not yet ruled out by the
guesses made. The first step is to look for words only using
"rank 1" letters, if that's not possible, use "rank 2"
letters and so on.  More precisely, select first letter.
for every guess already made, calculate the hits, non-hits
and non-misses for a word starting with this letter ...
if hits, non-hits or non-misses for any guess exceed the
values indicated by the answer to the guess, select another
first letter (with lower rank), else continue with the second letter
and so on for every letter of the word (recursively). If the
last letter of the word has been succesfully selected, the
misses still have to be checked, and then the word is printed.

That's all. QuackTheDuck does not try to make guesses which give
a lot of information, it simply makes the next possible guess.
This seemed different from the techniques discussed on the
message board that I decided to have a name with an animal
very different from a bull or cow. Since the program simply
utters some english-like words and since I didn't think that my
approach would work well, I named my program QuackTheDuck.
After I got it to work, I realized that somehow it worked
much better than I had thought, but I didn't change the name.
I thought it was funny and at the time, I thought that  it
should be funny. But I just looked up what Fred says about it in the
"Common POTM Rules", and there he says "something clever please!".
Oops.

I have added something to the algorithm:
If a guess gets wordlen-1 hits and there are more than 3
possible words left, I try to make guesses that eliminate more
than one possibility.

My approach has a big problem: some words can't be found.
Example words starting with "ADX", because "X" is not in the
list for "AD" at the start of a word. I find all words in
my spell dictionary, and many more, but nevertheless
I had to extend my algorithm: if no word can be found,
fill all ranklists with missing letters. So everything will
be found, even "ASDJJFX", but often would need many guesses...
but at least it will be found!

If time is short, I print the "best" word found up to now.

++ Did QuackTheDuck use any letter frequency tables?  Where did you get =
them?

Yes. I used the rank lists described above. I used my spell dictionary to
create them. That's why QuackTheDuck finds all the words in my
dictionary (and many more).

++ If I had it to do all over - here's what I would try ....

Not sure.


++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM ID=
EAS?

If there will ever be a similar POTM contest, it would be fun to
allow access to the dictionary - the dictionary could then be in
any language, or be a fantasy or "complete" dictionary.
A dictionary with exactly one word would be interesting...
In general, a first step, where every program generates its
own statistics, and a second step, the guessing or whatever it
might be, could be used in other situations.


>>>>>>>>>>>>>>
_________________________________________________________
DICTIONARY ENTRY: bigmoo    15 (  5/  4/  6)    c Susan Adams
_________________________________________________________
=================
bigmoo submitted by Susan Adams at buttgirl_69@yahoo.com
=================
++ Please summarize the algorithm used by bigmoo.

It was a walking dictionary. Over 63,000 words totaling about 650K.
They were divided into lists by word length. The actual code was only
about 1% of the total size. It used the idea that Fred so eloquently
stated:

   2. a most useful "trick" was making good use of the
      scores of previous guesses to limit the "next"
      guess ... as follows:

      a) I'm going to make a guess - there are a large
         number of possible guesses to choose from

      b) IF my guess is to be a winner, then it MUST
         yield the identical scores that my previous
         guesses yielded.  If it doesn't, then it
         couldn't have been the target word - so
         perhaps I'm better off with another guess!


++ Did bigmoo use any letter frequency tables? Where did you get them?

No way. What use would I have for those? Go big or don't go at all. I
got my dictionary from Paul Banta (Bozo).


++ If I had it to do all over - here's what I would try ....

1) I'd put my list of words in a comment. That way, they wouldn't count
against me for size. The rules specifically say that comments don't
count against you.

2) I'd copy the source code ("BigMoo.c"), comments and all, into my
temp file ("TEMP.BigMoo"). I don't think there are any rules against
that.

3) I'd read my temp file into a large array of guesses.

I'll have to check with Fred on this, but I think making these three
changes would yield a legal entry by the letter of the rules. It might
bend the spirit of the rules slightly, though.


++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM
IDEAS?

I'm a perpetual student in Cheyenne, Wyoming. Do for fun? I don't know.
Girls just wanna have fun! Innermost Secret? That's a tough one. Help
me out here, Fred. POTM ideas? None.

_________________________________________________________
DICTIONARY ENTRY: Mad_Cow_Cheating 17 (  7/  5/  5) gC Frederic vanderPlancke
_________________________________________________________
=================
Mad_Cow_Cheating submitted by Frederic van der Plancke at =
frederic.vdplancke@writeme.com
=================
++ Please summarize the algorithm used by Mad_Cow_Cheating

The algorithm relies on a heuristic formula to evaluate a
possible new guess: if the outcomes of the guess separate
the set (of size S) of possible solutions in parts that
have size s_1, s_2,... s_N then the value of the guess is
given by:

   bare_value(guess) := - sum_i (s_i) * log(s_i)
   value(guess) := log(S) + bare_value(guess) / S

where bare_value is used in the comparison, whereas value
is the really meaningful quantity: it represents the expected
quantity of information gained by the guess. (By quantity of
information I mean the log of the size of the solution set;
I expected this to be proportional to the number of
guesses needed. This  is roughly true in the beginning of a game,
but becomes an underestimation towards the end.)

(NB. I didn't take the time to re-check the maths so I
might be wrong)

When comparing a new guess that is a possible solution to
one which is not (or is not known to be one) the above
value is adapted to take it into account.

(1) The "basic" algorithm (used in earlier versions):
It considers all words in the dictionary (even words that
cannot be the solution according to the current knowledge),
and chooses the one that has the best heuristic value.

(2) The "genetic" algorithm (Mad_Cow_Cheating's algorithm):
It starts with the whole dictionary (*), or a random sample
of it if the dictionary is too big. (I did not really bother
with the 10 seconds limit, for quite obvious reasons ;-)
but I still did not want to make Fred wait for  minutes...)
Then it creates new words from old ones, by mutations,
permutations and cross-overs.

The innovative part is the structure I use for maintaining
the genetic pool: a "heap" or "priority queue": that's a
binary tree where each node is a better candidate than the
two "son" nodes. This way, I can instanteously get the best
move so far, and for chosing words for reproduction I can weight them
according to their depth in the tree, which is roughly
correlated with their goodness. (To increase this correlation
I insert the new guesses randomly in the bottom level,
before restoring the heap invariant as usual).

(*) reduced to words of the right length, this goes without saying...

++ Did Mad_Cow_Cheating use any letter frequency tables?

Mad_Cow_Cheating uses a dictionary, so it does not need
frequency tables ;-) I made searches and found two sources of
special interest (for MOO purposes):
UKACD 1.5: =

MOBY:
(especially the "common" dictionary in "Moby Words"; they have many
things including a pronunciation dictionary, the whole of
Sheakespeare's work and the Constitution of the U.S.A. !)
The page where I found links to these is

where you'll find links to a LOT of dictionaries of various kinds...

I took the intersection of these UKACD15 and MOBY(common),
reduced to the 4- to 9- letters words, as my dictionary.

++ If I had it to do all over - here's what I would try ....

I think Mad_Cow_Cheating is quite good as is...

If I had the time... I could implement alpha-beta min-max
analysis of the end of game (and perhaps of the whole game
if resources permit; D.Knuth has done it for the version
of MasterMind that has 1296 possible solutions. A complication
here is that MasterMind symmetries are lost; and the
possible moves are much more numerous...)

I noticed the unexpected fact that the "basic" algorithm
outperforms the "genetic" algorithm on lengthy words.
This means dictionary words have an advantage that my
heuristic does not capture, probably because they are more
akin to the words to split, but I did not study this
question further. (The "basic" algorithm should find
most 9-letters words in no more than 4 guesses, even
when using a bigger dictionary.)

Oh yes, I could weight each dictionary entry
according to its probability of being chosen by Fred ;-)


++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM =
IDEAS?
I live in Brussels (Schaerbeek) in Belgium.
Look at "SplitMoo" for the other answers...


_________________________________________________________
             deepmoo    31 (  7/ 11/ 13)   gC Sean Barrett
_________________________________________________________
=================
deepmoo submitted by Sean Barrett at buzzard@world.std.com
=================
++ Please summarize the algorithm used by deepmoo

A sort-of diary of the development of deepmoo is
at http://world.std.com/~buzzard/potm/journal.txt

OPENING:

Fixed opening book (does not change in response
to results), short-circuited if all letters are
encountered.  Binary encoding is used for most
probes for 7+ letter words (consider the meaning
of 5 misses for 'OAAEEEE').

The fixed opening book guarantees that every
letter (or every letter but one) has been probed
by the time the opening book is completed--or
else all letters in the word are accounted
for if short-circuited, but this cannot happen
if there are repeats.  (This avoids slow probes
through lots of letters during endgame.)

MIDGAME AND NORMAL ENDGAME:

Depth-first search for strings that match all
previous guesses.  Heavily "pruned" for speed
(i.e. without changing search results).  Only
matches for which every set of three letters
matches some word in a precompiled dictionary
are allowed.  Search at each letter position
goes in order of letter frequencies at that
position.  (This is not actually using a dictionary,
but a set of frequency tables that have only
1-bit frequencies.  See below.)

LATE ENDGAME:

If all words that "match" the trigraph frequencies
are rejected, retraverse entire tree without
culling bogus trigraph words.  (Typically deepmoo
has enough information to only take one or two
extra guesses in this case if the word is actually
English or English-like but not in the original
dictionary used.)

++ Did deepmoo use any letter frequency tables?  Where did you get them?

I compiled a dictionary by getting two different
/usr/dict/words, removing all uppercase and non-alphabetic
words, running through it by hand looking for anything
too weird, and then passing the whole thing through 'spell'
on a machine belonging to an acquaintance which was
running the same OS version as the potmmaster.

I then generated frequencies as follows:
   For each letter position, compute the relative frequencies
   of each possible letter. I could have computed this independently
   for every possible word length, but I didn't bother.

   For every trigraph position, compute a 1-bit value for
   every three-letter string, indicating whether that string
   ever appears in that position.  Again, I could have computed
   this independently for each word length, and it would have
   made a big difference, but I couldn't afford that much data.

I generated an equivalent set of information with quadgraphs
(four letter strings), but was unable to compress them sufficiently.

The core premise of deepmoo is to use 1-bit frequencies so as
to effectively narrow the search space, rather than simply
orient it.  The final endgame guarantees that even words that
don't match the 1-bit frequencies are eventually explored.
In other words, multiply together the frequencies of eahc
of the trigraphs from each position.  In my case, the final
answer is either 1 or 0, which can be understood as
"much more likely" or "much less likely".  All of the former
are explored before any of the latter.

I have some serious concerns about the fact that there is
no real distinction between frequencies and dictionaries,
in some sense.  For a discussion of this extracted from
my journal (the most interesting/revealing thing in the
journal, I think):

http://world.std.com/~buzzard/potm

++ If I had it to do all over - here's what I would try ....

Nothing, because by the end all I was trying to do was
compress tables for better narrowing the wordspace (see
the aforementioned URL).  There may be good alternative
approaches, but in *practice* this is a good attack for
making the program 'seem to know English', which is of
course something all of us playing the game by hand do,
but which programs are absolutely ruled-against doing.

In some sense, I don't think the chosen POTM problem is an interesting
problem space.  The problem space "you can have a dictionary,
but you should be able to solve even words which aren't
real words" is the problem space humans solve.  The problem
space "the word must be in this dictionary which you can use"
is an interesting problem.  The problem space "the word could
be any string with equal odds" is an interesting problem
space.  The chosen POTM problem gives a problem space in
which the goal is to model the english language without
bothering to use a dictionary.  As such, it's kind of boring.
For me, at least, it became the problem "write a 20K program
that produces a wordlist as minimally-larger than some fixed
list of words as possible, while still containing all those
words".  The best I could come up with was 20,000 times as
large.  (I had one less than 100 times as large, but couldn't
get it small enough.)  As such, I could watch my program and
see how it *sort of* knew English, but not really, and this
seemed the main front on which it could be improved.  (Indeed,
there may simply be entirely different strategies, considering
how far deepmoo is from the top of the rankings, but they're
not coming to mind here.)

With the "less than 100 times as large" quadgraph-based
structures, deepmoo generally guessed words that *were*
in the original dictionary on the first guess after the
opening book.  Therefore, at about that point, it would
become interesting to pursue shortening the opening book,
or looking for other strategies without this dichotomy.

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS?

I live in Boston.  I program computers for fun and work,
writing 3d graphics code and games.

_________________________________________________________
            MooFight    32 (  9/ 11/ 12)    c Kris Kenis
_________________________________________________________
=================
MooFight submitted by Kris Kenis at kris.kenis@barco.com
=================
++ Please summarize the algorithm used by MooFight

Step 1: make random guesses with all "differrent" letters (wordlength >
6 letters --> 7 different letters, wordlength  4
different letters). Make guesses until all letters of the alphabet are
used or until all letters of the word or found (-> only when all the
letters of the word are different). Sometimes some letters can be
eliminated (misses=wordlength).

Step 2: make guesses with "only possible letters". Use 2 different
letters for wordlength  6 letters. It is always possible to get the exact lette
rs.
Check with previous guesses and try to eliminate other letters.

Step 3: Make guesses with exact letters using frequency tables.
Frequency table contains frequency information for a pair of letters.
Keep word with highest score.


++ Did MooFight use any letter frequency tables?  Where did you get
them?
Yes. Somewhere in the potm message board.

++ If I had it to do all over - here's what I would try ....

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM
IDEAS?
Gent - Belgium - Graphics R&D Engineer


_________________________________________________________
    NoBullPeacePrize    32 (  9/ 10/ 13)   gC Randy Saint
_________________________________________________________
=================
NoBullPeacePrize submitted by Randy Saint at saint@phoenix.net
=================
++ Please summarize the algorithm used by NoBullPeacePrize
I borrowed this strategy from the Mastermind optimal algorithm.
Try to generate a complete list of all possible solutions (candidates).
Limit this list to 2000 words.
Test all candidates against each other to see which one splits the list the
"best".  If time is left over, generate random guesses to see if they can
split the list of candidates "better."
By "best" and "better", each candidate word is given a score (the sum of the
frequencies of each letter pair).  The "best" guess would be the one that
generates the most even histogram of scores for each possible guess result.
This way, the program would lean more towards finding guesses that would
split the candidates that were most likely words.
I also kept a list of the letters and which locations the letters could not
be in.  This helped to reduce the amount of possible candidates I had to
search through.

++ Did NoBullPeacePrize use any letter frequency tables?  Where did you get
them?
Yes.  At first I got them from some lists that people had posted on the
original message board.
With only a week to go, I ran those lists through the spell program and
realized that there were many words in there that would fail.  I culled
those out of the list and used the remaining ones that were spelled
correctly.

++ If I had it to do all over - here's what I would try ....
I couldn't figure out any way to generate my list of candidates faster.
After a few guesses, the list of candidates was zero and I couldn't search
through enough of the possibilities to find even one.  Then the program
takes a previous guess, changes one letter and uses that.  That way I get a
little information about the two letters.

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS?
Live: Houston, Texas.
Fun: Programming Contests, and tinkering with my computer.  I purchased a
used motherboard to use in my second computer, but somehow screwed it up
with less than 2 days to go before the contest deadline.
Work: Software Development Manager with Lockheed Martin, at NASA, Johnson
Space Center.
Secret:
Ideas: Number guessing vs. the other contestants, was a contest called
Knobeln.


_________________________________________________________
              MooBoo    33 (  8/ 13/ 12)    C Rolandas Gricius
_________________________________________________________
        Sorry ... No questionairre was returned for MooBoo

_________________________________________________________
           one_guess    34 (  8/ 12/ 14)    c Ionel Santa
_________________________________________________________

=================
one_guess submitted by Ionel Santa at ionels@phobos.cs.unibuc.ro
=================

DESCRIPTION OF THE ALGORITHM

1. My algorithm tries at every guess to find a word that is possible to be
correct. The words are searched with a backtracking algorithm, taking care
of eliminating from the beginning the words that contain invalid combinations
of letters, and also the words that are in contradiction with the current
knowledge obtained from the previous guesses and answers. If in seven seconds
isn't found any word, than the search is continued with a random search.
(Hint : if the space of solutions is very small, then the knowledge obtained
already makes backtracking a very easy task, and if the space of solutions
is very large, a random search (conforming also the knowledge) will be more
likely to succeed.
   My first 3-5 guesses (depending on the word length) were predefined. For
example for 7 length I've answered first something like XYYZZZZ, where X Y
Z are frequent letters, which permitted me to know for every of these
letters if there is or not in the final word. (For example if MISSES = 3, I
can take for sure that Z is in the word, and X, Y aren't).
   I tried to make inferences about :
a) letters that can or cannot be on a specified position in the word.
b) The minimal and maximal number of letters that can be in the final word
for different sets of letters. At this point I make inferences about
intersection, difference, etc.

LETTER FREQUENCIES

2. I use very little frequencies, only for my first guesses. I took these
tables by counting letters in a novel, regardless the length of words (I
counted every letter).

WHAT I WOULD TRY

3. Maybe I would try to work with schemes like :
['A'-'Z']['O']['R','Y']['Y'] - which means that the first letter is from 'A'
to 'Z', the second is 'O', etc. First I would go with ['A'-'Z']['A'-'Z']...,
and then after every guess-answer I would split the schemes that I have into
others who are conforming this guess-answer.

WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS?

4. I live in Romania, for the moment I'm at studies in Bucharest (computer
science), I like very much literature and music, I cannot make public my
secrets, and for the moment I haven't any POTM ideas. Anyhow, I think this
contest is very well organized, and I thanks FRED for that.

_________________________________________________________
                Bozo    35 (  8/ 11/ 16)    c Paul Banta
_________________________________________________________
=================
Bozo submitted by Paul Banta at pbanta@yahoo.com
=================
++ Please summarize the algorithm used by Bozo

Bozo is about 1/3 statistics and about 2/3 code.

Bozo makes some initial "pre-canned" guesses.

It then generates guesses based on the following idea. The word that I'm about
to guess MIGHT be the word that Fred is thinking of. If it IS the word that
Fred is thinking of, I should be able to compare it to all my previous guesses
and get the same number of hits and misses that Fred reported. So, that's
exactly what I do. I compare the word that I want to guess with all my previous
guesses and see if the hits and misses match Fred's hits and misses. If the new
guess doesn't match all of Fred's hits and misses, then I don't guess it - it
CAN'T be the right word. Instead, I keep looking for a word that matches ALL of
Fred's hits and misses. When I find such a word, that becomes my next guess.


++ Did Bozo use any letter frequency tables?  Where did you get them?

Yes, but they were only minimally helpful. I generated them myself from my
63,000+ word dictionary. A more interesting question is where did I get my
dictionary from. I found several dictionaries on the web. I combined 3: a list
of english words, a Webster's dictionary, and the English half of an
English/Spanish dictionary. I then threw out all the words that were not 4-9
characters long. That gave me a list about 100,000 words. I then ran this list
through a Sun spell program (special thanks to Ken Been in NY for doing this
for me), and threw out the words that it didn't like. This gave me a list of
about 63,000 words.

What was more helpful than letter statistics were other self-generated
statistics that I use in guessing words. I have a table of which triple-letter
combinations occur and don't occur in my dictionary. For example, "QUI" does
occur in my dictionary, but "BBB" doesn't. This helps me to never even consider
guessing words with "BBB" in them. There are 17,576 possible three letter
combinations (26^3). I don't remember the specifics, but I think only about 1/3
of them occurred in my dictionary. So, 2/3 of them didn't occur. I also stored
and used statistics for which sequences of 4 consecutive consonants, 5
consecutive consonants, and 6 consecutive consonants occur in my dictionary.


++ If I had it to do all over - here's what I would try ....

I had a somewhat different algorithm idea late in the contest, but didn't have
time to implement it.

Phase 1: You make an initial guess, say "ABCDEF" (6-letter word). Fred gives
you back hits and misses.

Phase 2: Your next guess uses only those 6 letters such that if it WAS the word
that Fred is thinking of, it would get the same number of hits and misses as
Fred reported. You keep making guesses this way until you can't make any more -
any guess that you make wouldn't have the same number of hits and misses with
all previous guesses that Fred reported. When you run out of guesses, you go
back to phase 1 and guess another 6 letters, say "GHIJKL" - now you have 12
letters to work with.

I think this algorithm is better than my algorithm, but I'm not sure how much
better. My current algorithm averages about 12.3 guesses per word which aint
too shabby.


++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS?

Live: Colorado Springs, Colorado, USA

Fun: Hiking, Fishing, Camping, Backpacking, Volleyball, Breaking perfectly good
automobiles under the guise of "fixing" them, POTM.

Work: An old computer programmer trying to learn new tricks (java).

Innermost Secret: I have no more inner-most secret. I'm out of the closet (so
to speak). I'm "Mr. Bozo".

POTM Ideas: None that I haven't already sent to Fred.


_________________________________________________________
              zigzag    35 (  7/ 13/ 15)    c Jaco Cronje
_________________________________________________________
=================
zigzag submitted by Jaco Cronje at s9805027@student.up.ac.za
=================
++ Please summarize the algorithm used by zigzag
For words of length 4-6 :
For the first 4 picks, output something like ABCD then EFGH then IJKL
then MNOP. After that, do a DFS to find a solution that is legal.
A "legal" solution is one that could be a possible answer in
comparison with the previous picks. If the time is up and no "legal"
solution were found, just output the last combination of letters
tested.

For words of length 7-9:
Find out exactly which letters are used in the word.
Maximum of 9 picks required.
Pick something in the form of : ABBCCCC ( For 7 letter words )
Then, check the result of the misses.
If 0 miss, then A,B and C is in word.
If 1 miss, then B,C is in word and A not.
if 2 miss, then A,C is in word and B not.
if 3 miss, then C is in word and not A or B.
if 4 miss, then A,B is in word and C not.
if 5 miss, then B is in word and not A or C.
if 6 miss, then A is in word and not B or C.
if 7 miss, then A,B and C is not in the word.
( The same thing can be done for words of length 8 and 9)

When the (amount of letters found = letters in word) and (picks < 9)
then this process is stopped.
Now, again a DFS is done, only on "legal" solutions and only the
correct letters are used.

++ Did zigzag use any letter frequency tables?  Where did you get them?
Yes, compute them from the UNIX dictonary.
Used to optimize the DFS. Searching in the most possible solution
space.

++ If I had it to do all over - here's what I would try ....
Maybe a combination of a GA and a DFS.
My DFS can be optimized more I think.
Better combinations to figure out what letters are used and how many
of each letter are used.

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS?
Live in : South-Africa, in Gauteng near Pretoria.
For Fun : Visit my girlfriend, play Quake, program games.
For work: Student at the University of Pretoria, doing Computer
Science. I am a 2nd year student now.
Secret : Thats for you to find out.
Ideas :
* Playing a game against other people.
* Programming a robot who fights against the other bots.
* Quake like deathmatch in a 2D arena.
* or Maybe a Chesslike board game.
* or Chinese Checkers...
* or maybe Capture the Flag !!

_________________________________________________________
            scharade    36 ( 10/ 12/ 14)   gC Michael Strauch
_________________________________________________________
=================
scharade submitted by Michael Strauch at michael.strauch@cityweb.de
=================
++ Please summarize the algorithm used by scharade

In general, "scharade" uses the following strategy:

1) The first guesses test the complete alphabet, using a hard
   coded list for each word length from 4 to 9 (if some of these first
   guesses show up that all characters of the word are identified, scharade
   continues immediately with step 2)

2) The next guesses are calculated by some kind of exhaustive search to find
   a word that could be a possible solution. This might not be the optimal
   strategy, it could be more efficient to use some guesses that are
   not possible any more, but I had no good idea how to accomplish this.

3) Some times, ten seconds are not enough for step 2 - if scharade exceeds a
   running time of 9 seconds, it stops immediately and chooses a random
   guess.

I think step 2 needs some more explanation. First, I set up a
straight-forward backtracking algorithm for this task - but though I
optimized it a lot, scharade ran too often into step 3, especially for words
of length 8 or 9. Than I tried to make use of the fact that Fred said that
he will only use "spell-checking" words (see below). That made the situation
a little bit better, and decreased the average number of guesses.

Finally, I added some ingredient called "Ordered Binary Decision Diagram"
(OBDD). These kind of data structure allows scharade to calculate quickly
all subsets of the alphabet that are a possible candidate for Fred's hidden
word, according to the information of the previous guesses and their number
of "misses". Then, the backtracking algorithm runs once for every possible
subset, now testing only words with characters from this subset, to find one
that meet the restrictions coming from the previous "hit" results. This
gaves me the speed improvement I was looking for - step 3 was only reached
in very rare cases (on my machine, and Fred's machine seems to be comparable).

An OBDD is some kind of finite state machine, that represents a binary
function. Scharade uses OBDDs to calculate the function f(x1,...,x26),
where x1,...x26 are binary input variables representing one subset of the
alphabet, and where f(...)=1 if and only if (x1,...,x26) represents a subset
that meets all of those "miss restrictions". As you might have noticed, the
number of "misses" of a guess do not give you any information about the order
of the letters you are looking for, it tells you only which letters are used.
(Instead of using OBDDs, one could make a list of all ~5 million possible
subsets of the alphabet of size >> Yeah, I allowed it ... =Fred

++ If I had it to do all over - here's what I would try ....

- try to develop a better theory of how to find "good" guesses
- try to encode more statistical information into 25Kb of
  source code
- perhaps, rewrite the thing in Perl, just for fun (well, that
  "10 secs per guess" rule gives me some headache about this)

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM
IDEAS?

I live in germany, do POTM contests for fun, do software engineering for
work, won't tell my innermost secret  -- it wont be a secret any more ;-) --,
and wonder each time where Fred gets all those genious ideas for his
contests from.


_________________________________________________________
               steak    37 (  6/  9/ 22)   gc Ben Angrim
_________________________________________________________
=================
steak submitted by Ben Nye at angrim@montana.com
=================
++ Please summarize the algorithm used by steak
main idea of interest was to estimate the reduction in search space
resulting from a guess by creating a large sample of strings
that would be valid based on the previous guesses and the letter triples
and then picking which one partitioned that sample search space the best.

This worked very well provided that I was able to find a large sample.
In the final test(final word) my program ran into time
trouble and created very small sample search space which resulted
in BAD partitioning.

The letter frequency tables were used to weight each entry in the
sample space based on how likely its letter pairs were, 2 partitions of
(XZXZ,XXXZ,ZZZZ,ZOOX),(ZOOM) would be scored better than 3 partitions
(ZOOM,XZXZ,XXXZ),(ZZZZ),(ZOOX) for instance because ZOOM would be
weighted so high.

++ Did steak use any letter frequency tables?  Where did you get them?
yes, 2 tables, 1 was the frequency of pairs of letters, the other was
the possibility of triples of letters.  This resulted in very english-like
words, some of them very funny :-)
I got my letter frequencies from the dictionary that came with the
hangman game on my linux system.

++ If I had it to do all over - here's what I would try ....
start coding 1 day earlier, I thought about the problem for months,
but didn't start coding till Thursday evening.. first version that
worked at all was around 4PM Saturday.. I knew that it was having
speed troubles with (some) 9 letter words when I sent it in but
was out of time.

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS?
Still living in Missoula, MT
do for fun: program, play Starcraft, once it gets a little warmer hike
Work: Unix/C/X11 programmer
Secret: a secret shared is no secret

_________________________________________________________
               EISAR    38 ( 10/ 15/ 13)    c Alex Popiel
_________________________________________________________
=================
EISAR submitted by Alex Popiel at popiel@snugharbor.com
=================
++ Please summarize the algorithm used by EISAR

EISAR's logic is divided into two parts: a deductive engine
and a guess generator.  Most of my work went into the deductive
engine, since that's what I understood the best.

The deductive engine worked to determine for each letter:
the maximum number of instances of it in the word, the
minimum number of instances of it in the word, and whether
of not it was in each of the possible positions.  After
each guess, it would cycle through all of the guess results
so far, deriving as much information as it could from the
guesses.  Many different rules were used to extract information
from the guess results... this is where about half of the 25K
program lived.  Based on the number of misses, it would figure
out which letters used in the guess were present in the target
word (raising the minimum count to 1) (it would use guesses
with repeated letters in proportions like 4 E's, 2 S's, and
1 R, so that the miss count was unambiguous).  Based on the
number of hits, it would fill in the position grid, and change
the minimum counts.  Based on unknown positions and other minima,
it would set the maximum count (and also would set the maximum
count based on hits if all unknowns were in a single guess).
By iterating through this repeatedly (and also using knowledge
like there was exactly one letter in each position), the
deductive engine would fill in positively known data at a
fairly brisk pace.

The guessing engine was a different beast entirely.  It ran
in two phases: first, figure out what letters were in the word
(and how many of each), and then second sort out what positions
they were in.  To figure out what letters were present, it would
generate guesses of two or three letters repeated in unambiguous
proportions, as mentioned above, starting with the most likely
occuring letters, and continuing until all letters were located.
Thrown in here would be guesses just to count occurances of a known
present letter, if the probability of that letter being repeated
was higher than the probability that the next few unknown letters
would be present.  (Likelyhoods were determined via composites of
raw frequency charts and frequency of bigrams with known present
letters.)  Also, during this phase, letters would be positioned
in the _least_ likely positions to try and keep the hit count low,
so that the negative info in the position matrix filled in faster.

The second phase of the guessing began when the minimum and maximum
counts were equal to each other for all letters (that is, all the
letters were known to be present or not, and in what quantity).
EISAR would then begin to alternate a try for the real word (based
on letter-in-position frequencies, bigram frequencies, and consonant
run-lengths) and position determination for a single letter.
Occasionally (as with DUCK), the first guess after figuring out
what letters were present would be the actual word.  After figuring
out positively where various letters were, it was generally pretty
good about getting English-like guesses.

The submission that actually ran in the finals is a rather old version
of EISAR.  I made a number of "improvements" to the deductive engine,
generalizing the rules it used and allowing it to use combinatorial
set theory to figure stuff out... but it ran very much slower, without
consistent improvement to the guess count.  (I have lots and lots of
histograms of running various versions of EISAR against my entire
dictionary, if anyone's curious to see them.)

One last note: EISAR is guaranteed to find the solution in less than
100 guesses... it will know the letters present in no more than about
20 guesses, and it can easily find the positions in the remaining
80 guesses.  This is true regardless of the english-ness of the word
to be guessed.

++ Did EISAR use any letter frequency tables?  Where did you get them?

Yes, EISAR used letter frequency tables of many forms.  I had
separate frequency tables for each length of word for each of
"Does this letter occur in the word?" and "Does this letter
occur more than once in the word?".  I also had a bigram frequency
table (which was not length differentiated).  All of this was
encoded in my program to reduce source size.

I generated all my tables myself based on a fairly nice dictionary
that I got from reading the message board.  Since the source of
the final test words was unknown, I did not worry overmuch about
whether my dictionary matched the word distribution of the final
choices.

++ If I had it to do all over - here's what I would try ....

I'm really not sure what I would do.  I probably spent too much
time on the deductive engine, and not enough on the guess generator...
but I was too interested in provably finding the answer in time,
instead of finding english-like words quickly.

I would have liked to differentiate my guessing algorithm by word
length (for instance, when finding which letters were present in
length four words, it would have been beneficial to quickly eliminate
the low-frequency letters in groups of four with the first couple
guesses).


For the (missing?) Hints and Tricks section: Once again, I never
used malloc().  Didn't want to waste the time.  Also, all my state
information was in one big struct (with various arrays and such)
which could be shifted to and from disk easily with a single read
or write call.

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS?

I live in Mountain View, CA, USA.  I code for fun, I code for work,
code is my life.  (Well, okay, I also play games like scrabble and
othello and such with various friends, and go out to the occasional
movie, but don't tell anyone!)  My innermost secret?  Umm... I'm
obsessive-compulsive.  Wait, everybody knows that...

POTM ideas?  Given a set of shapes, how many different ways can you
fit them in a box?  Given a list of words, determine which are least
likely to be confusable?  Maybe a head-to-head game where you have
three artillery pieces, and each turn you can move them a certain
distance, or fire them, but not both, and the last one with a gun
left wins?

- Alex

_________________________________________________________
          SimpleMind    39 ( 12/ 14/ 13)    c Edmond Bernardy
_________________________________________________________
=================
SimpleMind submitted by Edmond Bernardy at ebernardy@lucent.com
=================
A History of SimpleMind, by ebernardy@lucent.com

A good strategy for solving a problem like mastermind seemed to be
to apply the concepts of information theory, and to try to maximize
at each step the information obtained in response to a proposed word.
The two numbers (hits and misses) can be considered as a message,
and if the probabilities of the different possible messages are known
(or can be computed/estimated), the information content of the
received message can be computed.
(The information of a message is computed as -SUM (p_i log(p_i)),
where p_i is the probability of a message instance, and the sum is
taken over all possible messages).

The strategy is simple to state, but Fred took care to make the
search space so large - up to 26 exp 9, or about 5.4E12 at the
beginning - that counting the numbers of possible answers is not
a realistic proposition.
The idea was therefore to generate at each step a (representative)
sample of possible words, and to determine in that restricted sample
space the best proposal.

It was also decided to use only words which were consistent with
previous answers, although in some cases this is not the best choice.

An efficient method for generating possible words was obtained by
creating two arrays, which are updated at each trial and saved in
the temporary file.
One array contains the possible letter sets of already tried letters,
the other array contains the possible letters for each letter position.
Possible letter sets are coded as bit fields, 26 bits fitting nicely
into a 4 byte integer.
Sounds simple, but it took more time than expected for debugging and =
making
it efficient, eliminating duplicates or impossible letter combinations.

This strategy seemed to work OK for short words, but for words of more
than 6 letters it became obvious that letter and di-(tri?)graph =
frequencies
must be taken into account.

At this point, time was running short.
Single letter frequencies and frequencies of preceding and following
letters in digraphs were obtained for all words of at least 4 letters
from Webster's dictionary.

The present version of SimpleMind generates a word by first chosing
among the allowed letter sets the shortest set with the highest letter
frequency, and putting the letters in allowed positions. The word is
completed by adding the most probable letters, taking into account
already fixed preceding or following letters.
Weak spots are:
- different letter frequencies should be used for the first and last
  letters of a word: digraphs like 'rd' can appear at the end of a word,
  but not at the beginning
- in the first trials more weight should be given to using untried =
letters
  and avoiding (?) repeated letters

Conclusion: great fun, but next time START EARLIER!

_________________________________________________________
      Nero_the_Horse    42 ( 12/ 16/ 14)  PERL Mike Liddell
_________________________________________________________
=================
Nero_the_Horse submitted by Mike Liddell at mike_liddell@hotmail.com
=================
++ Please summarize the algorithm used by Nero_the_Horse
constraint solver---use previous guesses to determine all possible patterns,
using patterns.  eg ABxxE  would allow ABCDE (all letter restrictions are
observed..eg third letter might be A..P,Q,S..Z if we know that R isnt
allowed.  This bit works fine, but I did very little work on using this
information--I don't make particularly good guesses and my initial guesses
were made semi-randomly

++ Did Nero_the_Horse use any letter frequency tables?  Where did you get
them?
nup...it is not english-specific (dammit)

++ If I had it to do all over - here's what I would try ....
same approach, possibly in C as speed is a concern.  Spend time analysing
the second part of the problem (choosing best guess once full info is
determined)

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM
IDEAS?
Melbourne, Aus.  wrote this potm entry in London, currently in Philadelphia.

Disclaimer: I had to stop working on this potm at the end of March due to
travel etc.... so I have an excuse!

John Fitz: no entry?!?!?

Thanks once again, Fred, for a fun comp.


_________________________________________________________
         not_bullish    45 ( 10/ 11/ 24)  PERL Eric Prestemon
_________________________________________________________
=================
not_bullish submitted by Eric Prestemon at ecp@io.com
=================
++ Please summarize the algorithm used by not_bullish

Dumb initial guesses.

In the second phase, after sorting the previous guesses by hits and then
misses, it randomly fiddled its current guess to match as many previous
guesses as possible. Too slow on long words, so it would short-circuit
out.

++ Did not_bullish use any letter frequency tables?  Where did you get them?

The program had no knowledge of the English language except the 26
letters. Not even its dumb guesses were optimized.

++ If I had it to do all over - here's what I would try ....

I did do it all over, with a best-first search algorithm that worked much
better on most words, but was prohibitively slow. Then I lost interest.

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS?

I program at an internet startup in silicon valley. However, I still
manage to lead a relatively sane life.


_________________________________________________________
               TOEFL    46 ( 12/ 13/ 21)    c Sergey Kondrachshenko
_________________________________________________________
=================
TOEFL submitted by Sergey Kondrachshenko at SKondras@iacgsz.pavlodar.kz
=================
++ Please summarize the algorithm used by TOEFL
++ Did TOEFL use any letter frequency tables?  Where did you get them?

 In algorithm the following statistical data are used:
- the ranglist of symbols of the alphabet in decreasing order frequencies of
use of a symbol;
- the ranglist most frequently of met pairs symbols (in decreasing order);
- List of pairs symbols which have been not met in the processed list of words
of the English language;
- the ranglist most frequently of met groups of three symbols (in decreasing
order).
 First three lists are received for each possible(probable) size of a word,
and last general(common).
 For reception of the given statistical information the words from the followin
g
source were analysed: ftp://ftp.simtel.net/pub/simtelnet/msdos/lingstcs/ files
words1.zip, words2.zip, words3.zip, words4.zip.

Algorithm consists of two parts:

1. Generation of a new sequence of symbols as the decision.
 For check of presence of a symbol in a word the circuit 4444221 is used
depending on length of a word with necessary lengthening of the longest
component (for example 22221 or 44444221), and after presence of an absent
symbol strictly agrees to the circuit.
 The procedure for test of symbols according to a rank from the first list,
and in subsequent with the account already of available symbols and pairs
of symbols from the second list.
 When there is a symbol which presence in the certain position requires check,
it is carried out in parallel with check of the next portion of symbols.
 Choice of a symbol, for which the check will be carried out occurs in the
following order: at first group from three symbols, then pair of symbols, then
other variants in view of absent pairs (undertakes symbol at which less
possible(probable) positions and is checked position at which greatest number
of possible(probable) variants), in last turn simply possible variant
(similarly previous).
 For N=7 the circuit 4444221 is used if there are no symbols for check, the
circuit 221 is otherwise used.
 At check of presence of a symbol in a word such sequence is whenever possible
generated, that at the analysis there was an opportunity in addition (in the
certain conditions) unequivocally to define(determine) a position,
borrowed(occupied) by a symbol, (for example for 1 it is a position: for which
symbol yet is not determined, and for 2 it is one position already engaged and
one free).

2. Analysis of results of the previous decision.
 On MISS and used circuit the symbols, present in a word, and if it
unequivocally probably also position, borrowed(occupied) by them, (various
combinations HITS and MISS are defined(determined) in view of the offered
decision).
 The additional analysis will be carried out(spent):
a) All unequivocally certain positions should not be checked at the present
symbols).
b) If there is a symbol at which there is no certain position and only one
possible(probable) unverified position: that she(it) is precisely determined
also performance of item a).
c) If all symbols are checked up on presence in a word and there is a position
Which can be borrowed(occupied) only by(with) one of symbols: that this
position is precisely determined and the items a) and b) are carried out.
 If the sum of the certain positions and symbols not having any position is
equal N, the stayed symbols on presence are not checked.
 If a symbol for one position precisely is not determined: that in decision are
used all guessed symbols and next checked on position a symbol (if he will
coincide that complete answer) at once will be received.

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS?

Where do you live :  Republik of Kazakhstan ,  city Pavlodar
Do for fun :  programming, logical game
Do for work :  programmer, temporarily unemployed


_________________________________________________________
      Leap_Year_Baby    50 ( 12/ 20/ 18)   gc Yoichi Kono
_________________________________________________________
=================
Leap_Year_Baby submitted by Yoichi Kono at kono98@geocities.co.jp
=================
++ Please summarize the algorithm used by Leap_Year_Baby

In first several steps, try to figure out which character is
used exactly. After these steps, just try to figure out
positions from the history. (Anyway it is not so smart)

++ Did Leap_Year_Baby use any letter frequency tables?  Where did you get them?

Yes. I got them from English-Japanese online dictionary:-)

++ If I had it to do all over - here's what I would try ....

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS?


_________________________________________________________
          Cowculator    55 ( 19/ 18/ 18)  JAVA Ted Alper
_________________________________________________________
=================
Cowculator submitted by Ted Alper at alper@turing.stanford.edu
=================
++ Please summarize the algorithm used by Cowculator
optimized for speed of writing the program, not quality of results...
I thought about the problem for a little while back in February, but
never got around to coding. The day before the deadline, I had
the chance to write something, so I tried to put the gist of the idea
in, but didn't have time to do it all.

The goal was:
     (A) use initial guesses to get precise information about which
     letters occur -- for example, if you guess ABBCCCC, then the
     number of misses uniquely determines which letters among "A"
     "B" and "C" occur in answer. Of course, you may also get
     some information about letter placement.
     (B) once the number of possible letter-sequences that could be
     the answer becomes manageable, generate the collection of them
     and now make each guess one of the possible answers which
     does the best job of winnowing the set of possible answers in the
     worst case.

++ Did Cowculator use any letter frequency tables?  Where did you get them?
NO, I had intended it to do so,
but I didn't have time to implement that. Heck, I even
just used alphabetical order to make my initial guesses

++ If I had it to do all over - here's what I would try ....
I'd start coding more than 24 hours before the deadline! I'd use
letter frequency information to improve (A) and (B) -- guess most
common letters first, weight the possible answers by their likelihood
of being a word, etc. I'd improve my measure of using the informatoin
from hits and misses, and of generating valid words.

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS?
Palo Alto, CA. No fun; I wish I had more of a programming job,
lately I spend most of my work-time tutoring math students over
the internet.


_________________________________________________________
          Revolution    58 ( 11/ 24/ 23)    C Dave Krofssik
_________________________________________________________
=================
Revolution submitted by Dave Krofssik davek@soft-tek.com
=================
++ Please summarize the algorithm used by Revolution

Save any previous match and what was hit and missed for later
comparisons.  If the word was all misses, eliminate all letters in
that word from the alphabet.

Output the letters in the alphabet (in frequency order) until
all the letters have been used at least once.

Go through all combinations of letters that have not been eliminated,
searching for one that is compatible with the responses that were
previously recorded (i.e. a guess that would produce all the previous
responses if it were the correct one).

If a word was not found before time expired, output a word that
contains only the next letter in the alphabet that has not been
absolutely included or eliminated.

> ++ Did Revolution use any letter frequency tables?  Where did you
> get them?

I did use a table I found on a website, but I did not record which
site.

> ++ If I had it to do all over - here's what I would try ....

Speed up the processing, so I could handle more combinations in the
time allotted.

_________________________________________________________
            MOOguess    59 ( 19/ 22/ 18)    c Arkady Zhikharev
_________________________________________________________
=================
MOOguess submitted by Arkady Zhikharev at arzhik@yorp.yaroslavl.su
=================
++ Please summarize the algorithm used by MOOguess
I think that it isn`t interesting to anybody (look finals :( )

++ Did MOOguess use any letter frequency tables?  Where did you get them?
Look above

++ If I had it to do all over - here's what I would try ....
I would think slightly... :) Unfortunately I have not any free time

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM
IDEAS?
Russia. For fun I solved the last POTM problem (POTM is GOOD IDEA and GOOD
PROGRAMMERS TEST!). My work - to develop SCADA (Supervisory Control and Data
Acquisition) program. Secret ... shhhhhh - is a secret ;) If any idea will
visit me, of course, will let you know.

_________________________________________________________
           MOOtPoint    61 ( 13/ 24/ 24)    c Shawn Smith
_________________________________________________________
=================
MOOtPoint submitted by Shawn Smith at smiths2zzyy@worldnet.att.net
=================
++ Please summarize the algorithm used by MOOtPoint

There's nothing special about it.  It has two phases:  the first phase is
used to find all the letters that exist in the word, and the second phase
is used to find out where each letter goes.  It doesn't recognize all the
history in the game that would cut down the number of guesses, so it is
not very optimal.  The ONLY somewhat tricky part of the first phase is
that it uses the number of misses with a different number of each letter
to tell what letters are and are not in the target word.

++ Did MOOtPoint use any letter frequency tables?  Where did you get them?

It used letter frequency tables in the sense that the first phase selects
letters for its guesses from an examination of letter frequencies that I
did on a dictionary that I downloaded from the internet; I can't remember
where.  Here is a list of the initial guesses:

static int NLetter[10] = {0,0,0,0,2,2,2,3,3,3};
static int Dist[10][3] = {{0,0,0},{0,0,0},{0,0,0},{0,0,0},{1,3,0},
                          {2,3,0},{2,4,0},{1,2,4},{1,2,5},{1,3,5}};
static char const *IniGuesses[6][14]  = {
    {"AAAE","OOOS","RRLR","NINN","DDDT","PUPP","MBBB",
     "CHHH","GKKK","WYYY","FVVV","JZZZ","QQQX",0},
    {"SSSEE","RAARR","LOOLL","TIITT","DDNND","CUUCU","PHHPH",
     "MYMYY","BGBGG","FFFKK","WVWVV","ZZXZX","JQJQQ",0},
    {"SESSES","RAARRR","NININN","OOLOLO","DDDTDT","CUUCCC","MGMGGG",
     "PHHPHH","BYBYYY","FFFKKF","WVWVVV","JXJXXX","QQZZQQ",0},
    {"SRRRRES","NANIINN","OOOTLLO","CGGCGGD","PUMPMMM",
     "BHYBYYY","FWWKKWW","XXVZZXX","JQJJQQQ",0,0,0,0,0},
    {"SIIIIIES","NARNNNRN","LOOTLLLL","CGCGGGGD","MUPMPPPP",
     "BHFFFBFF","VKVKKKKY","WXXZZZZZ","QQQJJJJJ",0,0,0,0,0},
    {"SSSSIIIES","NARNNRNRN","LOLLOTOLL","CUDDUUUUD","PMPPMMMMG",
     "BHBYYYBYY","FWVVWVWWW","XXXXKZZZX","QQQQJJJJJ",0,0,0,0,0}
};
static char const LetterOrder[6][28] = {
    "EASOLRINTDUPMBCHGKWYFVJZXQ\0",
    "ESAROLITNDCUPHMYBGKFWVXZJQ\0",
    "ESARINLOTDUCMGPHBYKFWVJXZQ\0",
    "ESRAINTLODCGUPMHBYFKWVZXJJQ",
    "ESIARNTOLDCGUMPHBFYVKWXZQQJ",
    "EISARNTOLCDUGPMHBYFVWKZXQQJ"
};

++ If I had it to do all over - here's what I would try ....

A few things:
1.  I was hoping to have a set of trees (1 for each word size) that would
    be used to select the next word that MOOtPoint would guess, given the
    current selections that it made and the results obtained.
2.  Instead of trying to find where each letter goes individually, guessing
    possible solutions and examining the results for the next guess.  For
    example:  If I guessed PINETREES for PINEAPPLE, the result would have
    been 4 hits and 3 misses.  I would try to find the "next" word that
    would have 4 letters in the same position, and three letters that were
    not in PINETREES.
3.  Use neural nets to recognize English words, in order to cut down the
    number of possible guesses that would be available for #2.  There would
    have been one net for each size of word, and the inputs to each net
    would have been the letters, probably encoded in some way, and the
    output would have been 0 or 1, for answering the question, "Is this an
    english word?"

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS?

Las Vegas, Nevada.

The POTM, bicycle riding (because I'm too cheap to have car), and watch a
few programs on television.  Also, finding a completely optiminal solution
for the test words using my algorithm in the hopes that I could win a "Worst
abuse of the rules" consolation prize.

Maintain and modify part of the IR (information retrieval) system used on
the Yucca Mountain Site Characterization Project, for TRW Environmental
Safety Systems.

See my response on inverse boggle POTM.

Ditto.

_________________________________________________________
        Daily_Farmer    67 ( 11/ 33/ 23)    C Gabriel Gambetta
_________________________________________________________
=================
Daily_Farmer submitted by Gabriel Gambetta at ggambett@ucu.edu.uy
=================
++ Please summarize the algorithm used by Daily_Farmer
Quite straightforward. I'll use the words DOGS for example.
I call LETTER a character that is not a MISS nor a HIT
(ie., right letter in a wrong place)

1) Isolate the "groups" where the HITS and LETTERS are.
DF does that checking for the alphabet, in this
case ABCD EFGH IJKL MNOP QRST.

ABCD 0 3 (1L)
EFGH 1 3 (1H)
IJKL 0 4 (nothing)
MNOP (1L)
QRST (1L) ---> Got all the letters, stop here

It saves the useful groups (ABCD 1L, EFGH 1H, MNOP 1L, QRST 1L).
It also saves a "empty" letter, in this case I (the first
letter in the first useless group).

2) Get the HITS. Does this testing each letter in the
        group in its own place.

EIII 0 4
IFII 0 4
IIGI 1 3 => Current word = "__G_"

3) For each group containing letters, check each letter in a
valid place (a place where it was not tested and is
not used). This is for finding the letters and also finding a
hit in a lucky guess. It puts current word characters for the
case in which the last letter is found, so the final guess
is the "full guess"

IAGI 0 4
BIGI 0 4
CIGI 0 4 => No need to test DIII, it MUST be the letter in
this group. So, locate the letter by testing it in the remaining places.

DIGI 2 2 => Current word = "D_G_"

Continue this process until find the word.

4) (optional) If all groups were tested and the game continues
(the word was not guessed), there were repeated letters.
So begin testing each already used letter in any unused space.

++++++++ Did Daily_Farmer use any letter frequency tables?

Yes, actually it checked against EASR, INTO... I didn't remember
where did I get these. I did my own reserch on a word list,
but I don't remember which frequency chart I used at last.
The one I started using was posted on the Bulletin Board.

The other language-specific optimization I used is to arrange the
valid positions for testing words such that a test that would
imply two consecutive vowels go later.

++++++++ If I had it to do all over - here's what I would try ....

I stopped coding and thinking about the problem after my
program compiled for the first time. It was before easter, I think.
I had a couple ideas anyway, like testing hits and letters
from different groups in the same test. I don't think this
could help much. Other idea was dividing the tests in halfs
or thirds, so if ABCD had a hit, then test AB with the second
half of other group containing a hit, and so on...

+ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS?

Montevideo, Uruguay, somewhere lost in South America
Play soccer, write programs, study Computer Science!
Write administrative-like programs for my company
Actually I'm The Brain, and this is just a small part of my
        newest plan to take over the world...
Something like an Improved Showdown, maybe a tank war, with
        terrain attributes, line of sight... I don't know...


_________________________________________________________
            oxmoomoo    73 ( 17/ 22/ 34)    C Kwok Ngai
_________________________________________________________
=================
oxmoomoo submitted by Kwok Ngai at kwok@esun.att.com
=================
++ Please summarize the algorithm used by oxmoomoo
Fill all positions with a list of letter.
When there are GOOD letters exists on the list
        Loop through each poitions to determine the number of GOOD letter
        For each GOOD letter
                Loop through each position to determine the GOOD spot.

Continue the above process until all GOOD letter are found.  It also
keeps track of all possible GOOD spots and absolute BAD spots to speed up
or eliminte searching.


++ Did oxmoomoo use any letter frequency tables?  Where did you get them?
No

++ If I had it to do all over - here's what I would try ....
Do not know yet.

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS?
 I live in NJ and I am a programmer.


_________________________________________________________
               sieve    73 ( 15/ 29/ 29)  JAVA Aivars Zogla
_________________________________________________________
=================
sieve submitted by Aivars Zogla at fix.lv!uvsk@kcig2.att.att.com
=================
++ Please summarize the algorithm used by sieve
Two stages:
1) determining of letters
2) determining of correct positions
At first stage for words, made from 4-6 letters, each guess looked like
ABBB[BB], for words longer than 6: ABBCCCC[CC]. If it appears to be doubles,
or triples, or so, next some guesses were of format AAAA[AAAAA] - single
letter used.
Second stage - each guess took 2 letters and additional letter, which isn't
in the word at all: ABXX[XXXXX]. Depending on misses, decision was taken to
make next approaching guess for determining correct position for at least
one letter. If 2 misses' guess, first letter (out of 2) was queued for
further examining.
Here's "sieve's" fight for PINEAPPLE:

NUM     GUESS   HITS    MISSES           NUM    GUESS   HITS    MISSES
===     =====   ====    ======           ===    =====   ====    ======
1       AEEIIIIII       0       0         14    EAFFFFFFF       0       7
2       OUUPPPPPP       2       3         15    AIFFFFFFF       1       7
3       RSSMMMMMM       0       9         16    AFFFFFFFF       0       8
4       NGGWWWWWW       0       8         17    PFNFFFFFF       2       7
5       KLLCCCCCC       0       7         18    FFFLEFFFF       0       7
6       DHHVVVVVV       0       9         19    FFFEPFFFF       1       7
7       TQQJJJJJJ       0       9         20    FFFEFFFFF       1       8
8       YBBXXXXXX       0       9         21    FFFFPEFFF       0       7
9       FZZYYYYYY       0       9         22    FFFFEAFFF       0       7
10      EEEEEEEEE       2       0         23    FFFFALFFF       1       7
11      AAAAAAAAA       1       0         24    FFFFAFFFF       1       8
12      IIIIIIIII       1       0         25    FFFFFPPFF       2       7
13      PPPPPPPPP       3       0         26    FFFFFFFEL       0       7
                                          27    PINEAPPLE       9       0

Most interesting at this POTM was temporary file. For sach an algorithm it
was enough to store at most about 30 bytes of "knowledge about past". It's
even possible to restore all sequence from this knowledge at any particular
point of game.

++ Did sieve use any letter frequency tables?  Where did you get them?
I did. Vovels first, consonants after. What a frequency could appear in 3
final words?

++ If I had it to do all over - here's what I would try ....
I get some new and fresh ideas for myself out of this POTM. I wouldn't like
to forget them.

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS?
Ugale, Latvia. For fun I'm thinking what I could do for fun.
For work - still trying to pour knowledge about algorithms
        into high-school students' heads.
Secret is very simple - there already exists perfect pictures
        in nature, we just have to make them visible.
Fred's ideas for POTM are OK!


_________________________________________________________
              Abacus    78 ( 17/ 28/ 33)    c Igor Volkov
_________________________________________________________
=================
Abacus submitted by Igor Volkov at i-volkov@bigfoot.com
=================
++ Please summarize the algorithm used by Abacus
        Using letter frequency table I tryed to eliminate unused symbols,
then found duplicates.

++ Did Abacus use any letter frequency tables?  Where did you get them?
        It used frequency table of letters. I got it from the list (approx.
100000) English words.

++ If I had it to do all over - here's what I would try ....
        I would try to find more elegant solution.

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM
IDEAS?
        Belarus, Minsk. I work at Software Engineering Dept., BSU.


_________________________________________________________
           Max_Power    78 ( 20/ 23/ 35)   gC Ed Mandy
_________________________________________________________
=================
Max_Power submitted by Ed Mandy at erm@moby.atcorp.com
=================
++ Please summarize the algorithm used by Max_Power

It's a brute-force approach that uses some "smarts" to cut down on the
iterations necessary.  It should always get any (legal) word in the
allotted time/iterations.

The algorithm used it based on the idea that if you had a N-letter
word, you could try every letter of the alphabet in order to figure out
which letters are in the word.

This pass has guesses that look like:
AAAA
BBBB
CCCC

Then, once you are done with that pass, you know which letters are in
and which letters are out.  You can then start placing the letters in
their respective positions.

For example, if you knew that the letters A, C, S, and T were in the
word, then you could go through iterations that looked like:
ABBB
BABB            (now we know that A is in the 2nd space)
CBBB            (now we know that C is in the 1st space)
BBSB
BBBS            (now we know that the string is CATS)

I also do some combining of the 2 passes just to cut down on iterations.

++ Did Max_Power use any letter frequency tables?  Where did you get them?

Yes, well, kind of.  It uses a rearanged alphabet so that the most
likely letters are tried first.

I got the frequency table from a WWW site.

++ If I had it to do all over - here's what I would try ....

I would finish off my approach.  I think it's a good solid approach,
but I did not optimize it enough.

++ WHERE DO YOU LIVE?

In a suburb of Minneapolis, MN.

DO FOR FUN?

Play Magic (and now Pokemon), rollerblade, play computer games, play
with computer software and hardware, play with my fiance, etc.

DO FOR WORK?

I program for Triticom (www.triticom.com).


_________________________________________________________
              newbie    81 ( 17/ 30/ 34)   gc Arlie Stephens
_________________________________________________________
=================
newbie submitted by Arlie Stephens at arlie@netcom.com
=================
++ Please summarize the algorithm used by newbie

Newbie was very much unfinished, due to not spending sufficient time on it...
I'm embarassed at the simple-minded algorithm...

Very simpleminded:
        - scan letters in bunches to determine whether or not they were in the
                word (Mostly winnowing "not present" letters; some info saved
                for later)
        - rescan letters remaining individually to determine presence/absence
                and number of instances present
        - try each present letter in all possible positions (one guess per
                candidate position, using known-non-present fill)

++ Did newbie use any letter frequency tables?  Where did you get them?

Only the basic "ETAIONSHR" is most frequent in english, in that order, from
memory. (I used to solve ciphers for fun, and learned that list at that time.)

++ If I had it to do all over - here's what I would try ....

First and foremost, spend more time developing it.

Second, read the rules more carefully. I had a better algorithm for determining
letter presence that I didn't use because I got it from another entrant, and
didn't realize that it was legal to use his algorithm, provided I didn't use
his code.

If I stayed with the simplistic algorithm above, a whole bunch of refinements
were in my to-do list. This included switching to slightly different algorithm
variants (including letter frequencies, batch sizes when excluding letters,
etc.) for each word length, and a bunch of changes to garner more information
with each guess. Also, take more advantage of known properties of the english
language; the algorithm I used took eseentially no advantage of the fact that
the target was known to be an english word.

I'd also have experimented with an entirely different approach. (Or perhaps
several.) I had an intuition that the algorithm style I used would max out
(not be farther improvable) at a relatively high number of guesses, whereas
an algorithm based on letter patterns (arrangements, letter frequencies)
within actual words of each length would potentially do much better ... but
do much worse in its early stages, be significantly more work to develop,
and like-as-not exceed the code size limit.

Certainly I would have run dictionaries through various candidate algorithms
looking for which worked best.

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS?

I live in Sunnyvale CA (i.e. Silicon Valley). I work as a software engineer.
I haven't done much actual programming lately at work (I spend my time
writing documents, attending meetings, etc.) and have noticed that my skills
are slipping; I'm doing POTM in an attempt to get some of the rust off my
programming skills, and keep it off.


_________________________________________________________
             Mooniac    84 ( 22/ 30/ 32)  JAVA Denis Konovalchik
_________________________________________________________
=================
Mooniac submitted by Denis Konovalchik at denisk@compassplus.ru
=================
++ Please summarize the algorithm used by Mooniac

My algorythm is simple enough. I ordered all latin letters by
frequencies of them. Then I tried to compose whole word from the
current letter. If it was no "cows" in this word, it was necessary
to take the next letter, and so on. When the letter appeared in
the word, the algorhytm tries to determine its positions in the
word by the bisectional method -- by splitting the "unguessed"
word's letters set into two parts, and applying this method to
each of them. This routine runs until determining of all occurencies
of the current letter.

++ Did Mooniac use any letter frequency tables?  Where did you get them?

Yeah, 6 vectors of the single letter frequencies (for each word
length, from 4 to 9 letters). These vectors were built on the
dictionary placed to the net by one kind man (he also dropped the
link to this site into the POTM message board). Many thanxx for
him!

++ If I had it to do all over - here's what I would try ....


It was necessary to upgrade this algorhytm, but I didn't have
enogh time for it ;-( The main idea was to try each new letter not
alone but in the company with another letters. So, receiving "3 cows"
in answer to the version "ABBCCCC" we'll know that the word
has letters "A" and "B" within but it doesn't contain the letter "C".

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS?

I live in Magnitogorsk, Russia.
Our city is famous for its location (we live both in Europe and Asia;
the border between these continents, Ural river splits our
city into two parts), for its metallurgical plant (the biggest in
the world, that's why the ecology is poor enough ;-(.
The ice-hockey team "Metallurg" of our city became this year
champion of Russia and Europe (it's pitiful that it will not strike
this season with the champion of NHL).

I'm 24 y.o. programmer, working in the software firm, menacing C++
with Java and writing code on Microsoft mustdie (c) Windows. In
my vacation I do write letters into Russian computer magazines,
including "Computerra" (www.computerra.ru), where a year ago was
published my opus about POTM (Fred Great and Awful even placed the
link to it onto his pretty site ;-)) During this time I am also a
postgraduate student of Magnitogorsk Technical University.
I would be happy to work as programmer in United States some time later,
it's the real dream for each serious IT specialist in my
country. Anyway, now I spend a funny life.
Thanx very much everybody who entered this contest! I hope that
we'll meet here once more.


_________________________________________________________
      Brave_Cow_Word    90 ( 26/ 36/ 28)    c Lourenco Arnasalon
_________________________________________________________
        Sorry ... No questionairre was returned for Brave_Cow_Word

_________________________________________________________
           HitOrMiss    91 ( 17/ 29/ 45)  JAVA Stuart Rowland
_________________________________________________________
=================
HitOrMiss submitted by Stuart Rowland at swrowland@lucent.com
=================
++ Please summarize the algorithm used by HitOrMiss

HitOrMiss used logic to try to determine the word, mostly by
elimination.  It could solve any 4-9 letter sequence whether it was an
English word or not.  It first tried to eliminate as many letters as
possible by guessing words that contained the least frequently occurring
letters.  If the response was all misses, all the letters were
eliminated.  If the response showed that at least one letter was
correct, but no letter was in the correct position, each letter in the
guess could be eliminated in that position.  The temporary file
consisted of the set of possible letters for each position, the set of
letters that were known to be in the word, the set of letters that were
known not to be in the word, and the guess history.  The history was
reviewed each time.  If a guess in the history had one hit and all but
one of the letters were known not to be in the word, the remaining
letter had to be correct.  If a position was narrowed down to two
possible letters, a guess was constructed using one of the choices and
letters known not to be in the word.  The response would either select
the letter or reject it.  If a letter known to be in the word occurred
in only two positions, a guess was formed with the letter in one of the
positions and letters known not to be in the word.

++ Did HitOrMiss use any letter frequency tables?  Where did you get them?

I used to grope dictionary to find the frequency of occurrence of the
letters in each position over the sets of words of length 4 to 9.

++ If I had it to do all over - here's what I would try ....

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS?

New Jersey
POTM
program

In the Seventh Guest adventure game there is a microbe game that would
make an interesting competition.  It is a territory game played on a
square grid.  Each player starts with a single piece in opposite
corners.  A move consists of either cloning or jumping.  A clone move
creates a new piece in one of the 8 empty spaces next to an existing
piece.  Jumping is moving a piece to an empty square two away in one of
the 8 possible directions.  At the end of a move, any of the opponents
pieces that touch the moved piece are changed to the players color.  The
game ends when all positions are occupied.  The winner is the player
with the most pieces.

_________________________________________________________
          Spelling-B    93 ( 21/ 38/ 34)    c Andrew Gauld
_________________________________________________________
=================
Spelling-B submitted by Andrew Gauld at andrew.gauld@att.com
=================
++ Please summarize the algorithm used by Spelling-B
Two pass method: pass 1, find how many of what letters are contained in the
word by using guesses containing only single letters (EEEEE,TTTTT, etc.) once
all letters are identified, go to second pass.
Second pass is identify letters one by one by using a known bad letter to pad
the end of the guesses (EZZZZ, TZZZZ) this proceeds until the last two letters
at which point I just guess on their order.

++ Did Spelling-B use any letter frequency tables?  Where did you get them?
Just the old traditional ETAOIN SHRDLU as a seed for the letter search

++ If I had it to do all over - here's what I would try ....
Winning the lottery so I'd have more time to spend on it.

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS?


_________________________________________________________
               rayka    94 ( 23/ 37/ 34)    c O.Jansen-Olliges A.Gerns
_________________________________________________________
=========================
rayka was submitted by Olaf Jansen-Olliges and Arnd Gerns
        at Ojan_Band@t-online.de
=========================
++ Please summarize the algorithm used by rayka

it's a really dumb one:

1. step collect the letters in the word by testing
 aaaa, eeee, ffff,  and so on.
2. step find out the position of the letters with the help of a letter
not in the word
e.g. q ( eqqq, qeqq, qqeq , ...)
after this step you can build the word.

++ Did rayka use any letter frequency tables?  Where did you get them?

yes we've builded them from a UNIX dictionary file.

++ If I had it to do all over - here's what I would try ....

1. do it better ;-) (perhaps find more than one letter per try e.g abbb
instead of aaaa)2. spending more time (but unfortunately this time we
had a lot to do)

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM
IDEAS?

Arnd lives in Braunschweig and I (Olaf) come from Hannover

And sometimes we like to sit together in my home-office  (a sun and
some LINUX boxes (5)) wasting some time programming c or c++ (not for
the job).

rayka - by the way is the name of my daughter (15 month) one of the
reasons for me to leave my home-office not too late.

I'm looking forward to the next POTM-Problem !!


_________________________________________________________
           superfreq    94 ( 20/ 36/ 38)  PERL Dan Schmidt
_________________________________________________________
=================
superfreq submitted by Dan Schmidt at dfan@alum.mit.edu
=================
++ Please summarize the algorithm used by superfreq

superfreq is incredibly simple; it's actually the first program I got
working.  I then started a totally new program, superfreq2, which only
ever worked on a subset of legal words, but it does get HONESTLY in 7,
faster than any of the programs on the list :)

I'll describe superfreq2 from now on, since it's more interesting.  It
first determines what letters are in the word.  It guesses words like
EEEETTA, since we then know exactly what letters are in the word,
given the number of misses.

I then go through all possible permutations of those letters, checking
for compatibility with the results of previous guesses and then scoring
them.  Scoring is done with the aid of a trigraph table; the table
starts like this:

  ED$ 3073
  ING 2844
  NG$ 2505

where each line is the number of times that trigraph occurred in
/usr/dict/words (^ and $ stand for beginning- and end-of-string).

Anyway, the score is just the sum of these numbers for each
three-letter sequence in the word we're trying out: trigraphs not in
the table get a score of -10000.

Here it is at its best:

  d:\src\moo> sue HONESTLY
  ASAEEEEE -> 1 hits, 2 misses
  RRSRIRIN -> 0 hits, 6 misses
  TLOOLEOO -> 0 hits, 0 misses
  UECUDCCC -> 0 hits, 7 misses
  PMGPOMMM -> 0 hits, 7 misses
  BBBBHYHS -> 0 hits, 4 misses
  HONESTLY -> 2625 / 8 hits, 0 misses
  Got it in 7 guesses.

  d:\src\moo> sue LATRINES
  ASAEEEEE -> 1 hits, 0 misses
  RRSRIRIN -> 2 hits, 0 misses
  TLOOLEOO -> 0 hits, 4 misses
  LATRINES -> 4426 / 8 hits, 0 misses
  Got it in 4 guesses.

So it did great on 7+ letter words with no duplicates, but if there
were too many duplicates then my permutation code was too slow, and I
never even tried to handle 6- letter words (my EEEETTA trick for
quickly finding all the letters only works for 7+ letter words).

I had a fast multiset-permutation generator (stolen from Knuth),
scoring function, and validity checker, but even so I couldn't handle
more than one duplicate letter in a 9-letter word on my P166.

++ Did superfreq use any letter frequency tables?  Where did you get them?

It used a trigraph table that I created from /usr/dict/words on my system.
The only interesting thing about it is that it uses ^ and $ as well as
the 26 letters.  I think that helped a lot.

++ If I had it to do all over - here's what I would try ....

I would finish my better program!  But I never got a good idea for
what to do for short words or ones with too many duplicates.

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS?

I live in Cambridge, MA, and write music software.
A POTM idea is forthcoming...

--
                 Dan Schmidt -> dfan@harmonixmusic.com, dfan@alum.mit.edu
Honest Bob & the                http://www2.thecia.net/users/dfan/
Factory-to-Dealer Incentives -> http://www2.thecia.net/users/dfan/hbob/
          Gamelan Galak Tika -> http://web.mit.edu/galak-tika/www/

_________________________________________________________
          Moot_Force    97 ( 26/ 35/ 36)   SH Brian Rapp
_________________________________________________________
=================
Moot_Force submitted by Brian Rapp at rapp@boutell.com
=================
++ Please summarize the algorithm used by Moot_Force
Moot_Force uses a method which is guaranteed to solve the problem
in less than 100 guesses, but it is a very primitive method indeed.
It first guesses words consisting of all one letter (e.g. AAAAA)
and continues to do so until it discovers all the letters in the word.
Along the way it remembers one letter that is not in the word (e.g. Q).
In the next phase, it solves each position of the word one at a time
by guessing a valid letter in that position and the invalid letter
for all other positions (e.g. AQQQQ).  When all positions are known,
it suddenly guesses the correct word.  Because Moot_Force follows
such a simple strategy, it can almost completely ignore MISS data
and store its knowledge in a very small temp file.  Clearly such a
brute force strategy cannot _win_ the contest, but it does represent
an interesting and at least technically successful extreme case.

++ Did Moot_Force use any letter frequency tables?  Where did you get them?
Moot_Force does have a frequency list from which it guesses the letters
in order during the first phase.  Because the first positions in the word
will have more possibilities to be guessed in the second phase, I biased
the list somewhat towards letters near the beginning of words.  Because
Moot_Force isn't terribly concerned with winning, this list hasn't been
optimized very much, but it might be better than just using alphabetical
order.

++ If I had it to do all over - here's what I would try ....
I found it very amusing to program Moot_Force in shell script,
but if I were to put in the effort for a serious attempt I'd certainly
need to start over from scratch.

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS?
Maryland.  Games.  Programming.  Mercury.  Battle-Royale of all programs.

_________________________________________________________
            themoops   101 ( 32/ 30/ 39)    c Stephen Palmer
_________________________________________________________
=================
themoops submitted by Stephen Palmer at srpalmer@att.com
=================
++ Please summarize the algorithm used by themoops
Very simple algorithm.  Loop through all letters (using a crude
order based on most common letters first) to find out how many of
each letter are in the word.  Once all letters are known, search
for one letter at a time to find its position.  Once all positions
are known, you've got the word.

++ Did themoops use any letter frequency tables?  Where did you get them?
A frequency table was used, and there seem to be loads of examples
depending on the source.  Mine came from a web site that ranked letter
usage based on several popular novels.

++ If I had it to do all over - here's what I would try ....
I never cared about the misses, and determined the letters and
positions only from hits.  I'm sure I could have decreased the number
of guesses if I had tried to use all available information.

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS?
I live in central New Jersey and work for AT&T designing web solutions
using Microsoft NT.  And I love POTM problems.


_________________________________________________________
                 pat   104 ( 24/ 36/ 44)    c Bernard Hatt
_________________________________________________________
=================
pat submitted by Bernard Hatt at bmh@arkady.demon.co.uk
=================
++ Please summarize the algorithm used by pat

The algorithm was to change one letter at a time and learn
which letters were correct or could be eliminated. From previous
POTM experience I have learned that simple robust algorithms
are better able to withstand Fred's choice of test data :-)

++ Did pat use any letter frequency tables?  Where did you get them?

Letter frequency tables were generated from the expanded
GNU ispell english dictionary which stores words in a root+affix
format.

For each position in the word I had a table of the most likley
letters for that position in order, and for each letter in each
position an ordered list of the previous and next letters.

++ If I had it to do all over - here's what I would try ....

Changing only one letter at a time was never going to be a very
fast algorithm, (I think) a better solution would be to change
several letters at a time and store the information about
what was learned (eg. there is no 'Z' in the word, or there is an 'H'
or a 'K' in the word, but not in position 3 etc.) to pick
the best guesses from the letter probabilities and the stored
info.

++ WHERE DO YOU LIVE?

I'm based in Hemel Hempstead in the UK (about 25 miles NW of London)

DO FOR FUN?

Programming/Gardening/Cooking/Eating/Wine making/Drinking.

DO FOR WORK?

I work as a sysadmin for a large database company, where I am one of
the very few people who knows very little about databases.

INNERMOST SECRET?

POTM IDEAS?

How about some time of shape fitting problem like Tetris ?

_________________________________________________________
            simental   104 ( 26/ 40/ 38)    c Marty Fouts
_________________________________________________________
=================
simental submitted by Marty Fouts at potm@fogey.com
=================
++ Please summarize the algorithm used by simental

Simental is a 'stupid guesser' algorithm that did about as poorly as I
expected it would.  It works in two passes:

   * Pass 1 -- try each letter of the alphabet in decreasing order to
     find out what letters are in the target word.

   * Pass 2 -- try each letter of the word in each possible position
     in the word, in left to right order, skipping those positions
     that have already been found.

The only 'cleverness' at all in simental is that the letters of the
alphabet were tried according to a frequency order, rather than
alphabetically.

++ Did simental use any letter frequency tables?  Where did you get them?

Yes.  I invented the frequency table by analysing several online word
lists.

++ If I had it to do all over - here's what I would try ....

Almost anything.  Unfortunately, schedule intervened and I never got
around to improving the algorithm that I sent in as a place holder.

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS?

California.

Play classical guitar.

That the *real* answer to the universe is 17, not 42.

No, I'm not clever at making puzzles.


_________________________________________________________
          Fools_Gold   111 ( 32/ 35/ 44)    c Jeffrey Jackson
_________________________________________________________
=================
Fools_Gold submitted by Jeffrey Jackson at jjackson@harmonic.com
=================
++ Please summarize the algorithm used by Fools_Gold

step 1.  using my letter frequency array, make 'solid' guesses (e.g. SSSSSS
or HHHHHH for a 6 letter word) until i've found all the letters in the word.

step 2.  for each letter in the unknown word (left ot right) go through all
the letters we know are hits and check if it hits at that particular spot.
i used a letter known NOT to hit in the other spaces so i could check just
the one letter at a time.

++ Did Fools_Gold use any letter frequency tables?  Where did you get them?

i did use one.  it was just a simple array ordering the 26 letters.  i
pulled several thousand lines of plain text off the web and wrote a little
program that counted each letter.  i then ordered my array from most common
occuring letter to least.

++ If I had it to do all over - here's what I would try ....

i would come up with another dimension to my frequency table.  this other
dimension would carry frequencies for the most common letters following
certain other letters.  for example, the letter 'V' is almost always
followed by a vowel.  so if i found a 'V' anywhere, i could check the
vowels after that instead of sticking to my one dimensional frequency table.

++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS?

i live in minneapolis and write software for a living.  a real shocker i'm
sure.  haha.  fun?  is there time for that?  :-)  hey, if i told you my
innermost secret, it wouldn't be secret anymore.  :-)  and it might even
end up out on the internet somewhere!  haha.  yikes.

potm ideas....hmmmm.  although i didn't have time to code and submit an
entry, i really liked the one a few contests ago called 'scenic tour'.
more problems similar to that would be fun.


_________________________________________________________
               naive   111 ( 30/ 37/ 44)   gc Michael Chen
_________________________________________________________
        Sorry ... No questionairre was returned for naive

_________________________________________________________
             MereMOO  1021 (999/ 11/ 11)    c Pavel Zaletskiy
_________________________________________________________
=================
MereMOO submitted by Pavel Zaletskiy at pavelz@dpt.ustu.ru
=================
++ Please summarize the algorithm used by MereMOO

My program works at two steps:
1) determining letters presenting in the word;
2) determining the word, consisting of that letters.

For the first step the following algorithm was used.
If amount of letters in the word is more than or equal to 7
then each guessing word consists of 3 different letters.
The first letter meets there 1 time, the second - 2 times and
the third - 4 times. Other positions fills with letters,
which are already known about that they present in the word.
Having known a result of the guess program can determine whether
each of three letters is in or out of the word.

If word length is smaller than 7 than letters are divided into groups.
And for each group program makes several guesses. In each guess
information about one group and one letter of another group can
be asked. If word length >=5 than in each guess program asks info
about one more letter which placed in the word in several copies
to determine it presence undoubtedly.

At second step program generates all possible words. For each word calculates
some criteria and than choose the word with best criteria. This criteria
estimates as probability of that this word is an English one and words
also amount of words which will exclude in sorting out of the next guess.

++ Did MereMOO use any letter frequency tables?  Where did you get them?

Yes, it uses table for pairs and also it knows about mostly used triples and
chains of four letters.

++ If I had it to do all over - here's what I would try ....

I only can suggest to try to add some heuristics for short words (n