Entry Summaries The attached mail is the collection of write-ups I received. As usual they are mostly unedited (only personal POTM-master insults were removed). Write-ups are provided in order of finish (more or less). _________________________________________________________ SplitMoo 24 ( 7/ 9/ 8) gC Frederic vanderPlancke _________________________________________________________ ================= SplitMoo submitted by Frederic van der Plancke at fvdp@decis.be ================= ++ Please summarize the algorithm used by SplitMoo The first two guesses are hardcoded; they were computed by a variant of Mad_Cow_Cheating (a DICTIONARY ENTRY - see below). For the remaining guesses: for 4 seconds SplitMoo tries to find all solutions (words that are still possible according to the outcome of the previous guesses). If it does not find all solutions, for the next 4 seconds it tries to come out with a random sample of the solution space. (Ideally this sample should be unbiased, but in my program it is not, because I think this would be too expensive). Each solution is attributed an "englishness" representing the likeliness of it being an english word, based on the probability of each sequence of consecutive two letters in my dictionary. Then each solution found is evaluated against the solution sample (part of the computation is done during the search.) The heuristic aims at minimizing the expectation of the number of guesses needed to find the word, by examining how each candidate splits the sample (hence the name "SplitMoo"). For the last 2 seconds, it searches for an improved guess by examining mutations of the previous candidates. This part plays a really important role ! Doubling its running time has probably been one key to my success. (It's important in words like "STAT" (doesn't typecheck but I did test it anywa y): my program quickly founds it must be "S?A?", but then there remains many possibilities for the two remaining letters; if it did check only for guesses of the form "S?A?" it would lose time. Of course "englishness" helps as well in this case.) --- in more detail The most involved part is the search for solutions. I wrote a (cut-down) constraint propagation engine for that purpose... Not spending too much source code for it was an interesting challenge ! My program essentially maintains the following information during the whole search for a word: - set of letters that are possibly/surely present in the solution - for each i, set of possible/sure letters at position i - for each guess, set of positions for which it is possible/sure we have a hit (this is redundant with the previous sets actually) - "linear constraints", based on the number of misses for each guess and reduces the sets according to the previous guesses outcome (before searching for any solution) and the previously instantiated letters (during the search). It propagates simple deductions to do so. ("simple" does not really mean "trivial" though, as witnessed by my classes CharLinCtr or LetterUnionCtr !) The search uses heuristics so that it first places letters that are needed, according to the constraints: letters that are surely in the solution, but not yet known to be at any position; as a second choice, other letters that can help reaching the number of non-misses in a guess. ++ Did SplitMoo use any letter frequency tables? Where did you get them? SplitMoo uses a table of likeliness of all sequences of two letters (including spaces after or before the word, to take into account the likeliness of the initial and final letters of the word.) They were extracted from my dictionary. Where I got this dictionary is explained in the "Mad_Cow_Cheating" answers. ++ If I had it to do all over - here's what I would try .... Most importantly, I should share my time more evenly among the different parts of my design, and keep much more time for developping the guess generation algorithm and for overall tuning. Two important parts that make my program as good as it is were added in the last 24 hours: englishness measure, and the search for more-or-less random alternative guesses. I did take almost no time for real tuning (although I tested PINEAPPLE with every version of my program). It is possible that single tuning decisions could improve my results even more. I didn't even try spending more than 2 seconds in the final phase, even though moving from 1 sec. to 2 sec. improved my results a lot ! The deadline was close at the time and I was getting tired with a wish to finish... (I did not realize I had _that_ much of a chance to win !) Other remarks: * Perhaps is the constraint propagation engine overkill. Perhaps not. It surely helped about maintainability. I could perhaps have tried to be less brute-force-ish in my approach to finding solutions. * I should develop the last two seconds of alternative guesses search; maybe by introducing a more sophisticated genetic algorithm (like the one in Mad_Cow_Cheating). Two weeks before the deadline I tried a purely genetic approach. After much refinement, it was able to find PINEAPPLE in 14 to... 36 guesses, so I gave up the idea... It was based on SplitMoo's design but the task of figuring out a solution sample was handled by a genetic program. The problem then is to keep this sample even and not concentrated around one word or family of words; specially when no word matching all constraints is known. ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? I live and work in Brussels, Belgium. My main work project is a program for automated staff planning. For fun: drawing, swimming, writing programs (my next big "home" project is a MineSweeper solver, for which I have good ideas), reading/doing some maths... For my innermost secret, have a look at the "Mad_Cow_Cheating" answers... ;-) [Avec un clin d'oeil a Caroline N, salut si par hasard tu me lis !] POTM ideas: Solving MineSweeper (that game that ships with Windows) or a variant thereof (like allowing the player to continue the game after (s)he has hit a bomb)... I can stay off-contest if you think it's fairer... (but I'd prefer to stay in !) _________________________________________________________ SchrodingersCow 25 ( 7/ 8/ 10) gC William Little _________________________________________________________ ================= SchrodingersCow submitted by William Little ================= ++ Please summarize the algorithm used by SchrodingersCow The algorithm has 3 phases... PHASE 1: Generate a population, the members of which each match all of the information gathered so far about the target word. These members are generated by starting with a random sequence of letters, and then applying mutations. The mutant is then compared to the original, and replaces it unless it gets a higher (worse) score from the fitness function. This process continues for 2000 cycles, or until the score reaches zero. If zero is successfully reached, then the word is satisfactory and is added to the population; otherwise it is discarded to avoid getting trapped in a local minimum. The fitness function determines the candidate's score by calculating the number of hits and misses that would have been returned for all the guesses made thus far if the target word were the candidate. It totals the absolute deviation from the actual values, and unless desperation has kicked in, it also adds one for each letter triplet in the candidate which is not listed as viable in the triplets table. PHASE 2: If the size of the population is zero, then turn on desperation, and make a random guess to buy time. Otherwise, the population is now a sample of words which might be the target word. One of them might be the best guess, so, for each member, calculate how its use as a guess would divide the whole population, and the size of the largest chunk. Choose the member that would result in the smallest largest chunk as the starting point for phase 3. PHASE 3: The result of phase 2 is a good guess, chosen from among a population of possible target words, but the best guess need not be a possible target word. Therefore, phase 3 uses any remaining time until the 9 second mark to apply mutations to the guess, much as in phase 1, with the hopes of reducing the largest chunk size even further, with the difference that draws go to the original, not the mutant. Once time is up, the guess is made, and the program terminates. Those are the three phases of the program. As for the triplet table, it has one entry for each sequence of 3 letters, from AAA to ZZZ. Each entry has 3 relevant bits: one determines whether a triplet occurs at the beginning of any known word, one does the same for occurences at the ends of words, and the other one for occurences anywhere other than the beginning or end. ++ Did SchrodingersCow use any letter frequency tables? Where did you get them? No, it doesn't care a bit about letter frequency. All it cares about is whether a given letter triplet is viable. As for the triplet table, I wrote a separate program to generate that. ++ If I had it to do all over - here's what I would try .... ...fewer bad ideas and algorithms, before deciding on this one. ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? WHERE I LIVE: A quaint post-industrial globe, the native inhabitants of which are still largely convinced that they live in a Euclidean 3-space...Ha-ha-ha! WHAT I DO FOR FUN: Write programs, exploit technology, search for the perfect flashlight, and try to catch that blasted roadrunner. WHAT I DO FOR WORK: Try to keep everything from falling apart. INNERMOST SECRET: This extremely mysterious person's utmost secret? It's something...most entertaining. POTM IDEAS: Large cash prizes! _________________________________________________________ MooIndigo 27 ( 7/ 9/ 11) c Scott Kennedy _________________________________________________________ =============================================== MooIndigo submitted by Scott Kennedy at tsk@net2phone.com =============================================== +-+- Please summarize the algorithm used by MooIndigo MooIndigo generates each possible letter combination and compares it against each previous guess. If the comparison generates the same number of COWS and BULLS, then it is printed as the next guess. The letters chosen by order of frequency in commonly occurring words. The first set of guesses are selected to use the letters of the alphabet with no repeated letters. For example, if the word length is 6 letters, the first 4 guesses will contain different letters. The fifth guess will include the 2 unused letters and possible letters from previous guesses (based on number of COWS). The remaining guesses are generated by trying combinations with no repeated letters, 2 repeated letters, and finally any number of repeated letters. Two passes are made. On the first pass, letter combination frequency tables are used to reject bad letter combinations (e.g. all consonants, 'Q' but no 'U', words beginning with 'ZG', etc.). On the second pass, all combinations are tested (just in case the POTM-MASTER found an obscure word). The passes through the alphabet are made through a set of nested loops. The current indexes of the loops are saved in the temporary file so that each time the program starts up, it can reset the loop counters and pick up where it left off. To fully utilize the 10 seconds allotted for each pass, MooIndigo looks ahead for other possible guesses. Using the letter frequency tables, it determines a score for each guess and if time expires, it prints the guess with the highest score. If the 10 seconds expires and MooIndigo has not found a new guess, it prints the current combination that it is evaluating. MooIndigo generates a list of currently available letters so that it can skip letters that could not possibly be in the word. For example, if a guess generates 0 COWS and 0 BULLS, none of the letters in the word are used. Similarly, knowing which letters are not used, it can also determine which letters must be used. +-+- Did MooIndigo use any letter frequency tables? Where did you get them? Yes. The UNIX 'spell' dictionary was used to generate the following letter frequencies. 1) The letter frequency for each letter based on word length: AEOLRITNSUDMPHCBKGFWYVJZXQ, / 4 / EAROLSTINCUYHDPMBGKFWVZXJQ, / 5 / EARNITLOSCUDMHPYBGFKWVXZJQ, / 6 / EARINTOLSCUMDPHGBYFWKVXZJQ, / 7 / EAIRONTLSCDUMHPBGYFKWVXQJZ, / 8 / EAITRONSLCUDMPHGBYFWVKXQJZ, / 9 / EIAOTRNSLCUPMHDGBYVFWKXQJZ, / 10 / 2) Frequency of 2-letter combinations at beginning of a word 3) Frequency of 2-letter combinations at end of a word 4) Frequency of 2-letter combinations in the middle of a word 5) Frequency of N consonant in a row. 6) A list of three letter combinations that do no appear in words. +-+- If I had it to do all over - here's what I would try .... I would have like to do additional work on the scoring mechanism and some performance improvements to search more gueses within the 10-second allotment. I ran the program against a set of about 4000 words to make sure and changes made to the scoring and alogoritms actually reduced the average number of guesses. Also, trying different approaches to determine which letters are repeated might reduce the number of guesses. +-+- WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? By the C, by the C, by the beautiful C. _________________________________________________________ dancing_letters 27 ( 7/ 10/ 10) gC Pavel Ouvarov _________________________________________________________ ================= dancing_letters submitted by Pavel Ouvarov at uvarov@sirius.ihep.su ================= ++ Please summarize the algorithm used by dancing_letters The algo: After the first guess build the Candidate Word Space (CWS). Choose a word from CWS, try it and reduce the CWS. Make it repeatedly. There are two problems: First, how to organize CWS? I invented a class that I called 'mask', which can contain a great number of words. It is easily constructed and can be easily overlapped with another 'mask'. Thus, CWS is a (relatively small) list of 'mask's. Second, which word from CWS to choose? I decided to choose the best word, based on statistics. CWS may be very large and often there isn't enough time to look through all the words. So I choose only best 'mask's (based on statistics) and splits them into words. Then I choose the best word. (For detailes ask me or look through the code) ++ Did dancing_letters use any letter frequency tables? Where did you get them ? Yes, it uses three levels of statistics. (They were all taken from my linux spell dictionary). First, simple letter frequency table for each place in the word. Second, letter pair frequency table for each place in the word. Third, letter triple allowance. It helps to find out if the letter triple may be used or not without regard to place (for example AAA is not allowed). ++ If I had it to do all over - here's what I would try .... I'd follow better programming style... Perhaps I'd try to find the word that possibly reduces the CWS best. ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? I leave in Russia in the best town of the world - Protvino. Now I'm the second year student of the physical faculty in the Moscow State University. DO FOR FUN? Friends, girls, beer... DO FOR WORK? Physics... INNERMOST SECRET? Tsss... I think that Eugeniya Bagdasarova is jealous for you... Therefore she's the most pretty and sexual girl in the universe... _________________________________________________________ Skunk 28 ( 8/ 8/ 12) gC Alexey Zhelvis _________________________________________________________ Sorry ... No questionairre was returned for Skunk _________________________________________________________ StupidComp 29 ( 9/ 8/ 12) gC Trieu Nguyen _________________________________________________________ ================= StupidComp submitted by Trieu Nguyen at trieu@ix.netcom.com ================= ++ Please summarize the algorithm used by StupidComp In my program, I define a "English word" is strings of consonant, and vowel next to each other. For example: E-ngl-i-sh There is a array hold all strings of consonant, and vowel. At first the program will try some words, after that it will find the match one in all English words. ++ Did StupidComp use any letter frequency tables? Where did you get them? The array of consonant and vowel is generated from an English dictionary on the web. ++ If I had it to do all over - here's what I would try .... Dont dare to do it again :-) ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? Berkeley, California. Working at Lawrence Lab.Interested in computer chess. I wrote a small Chinese chess playing program, and let it play in the internet. It beats many people in fast game heheh. _________________________________________________________ QuackTheDuck 30 ( 5/ 17/ 8) c Wolfram Hinderer _________________________________________________________ ================= QuackTheDuck submitted by Wolfram Hinderer at Wolfram.Hinderer@math.unikarlsruhe.de ================= ++ Please summarize the algorithm used by QuackTheDuck For my first try I used Pascal. I transformed the answers into linear equations and simplified them. The problem then was to find the guesses which would give the most information. I was not very successful with this (often 20 guesses or more), and I didn't know how I should use Information about English. So I started again from scratch, using C. Well, that should probably be _abusing_ C. I have not so much experience programming in C, but I had to use it (no Pascal support, and it's faaaaster). If you have a look at my source, you will see that it is just terrible. The recursion (see below) is the only thing where I tried to write smart code: time is valuable! But now the algorithm: I use lots of data of the English language. I use triples, like "ING". For every combination of two letters ("AA", "AB", ..., "ZY", "ZZ") there is a rank list of the letters that could possibly follow (like the "probability" that a certain letter is the next letter in the word). For every position (first three letters in word, letters 2,3,4 in word etc.) I have seperate lists. (I had different lists for different word lengths at the beginning, with significantly better results, but the size limit... I had to compress my data anyway!) The ranks for "AD" at the start of a word would be "VDJMOAHUERSILK", for example. (I have lists for the first letter, and for the second letter depending on the first letter of a word.) My guess would be the first word found that is not yet ruled out by the guesses made. The first step is to look for words only using "rank 1" letters, if that's not possible, use "rank 2" letters and so on. More precisely, select first letter. for every guess already made, calculate the hits, non-hits and non-misses for a word starting with this letter ... if hits, non-hits or non-misses for any guess exceed the values indicated by the answer to the guess, select another first letter (with lower rank), else continue with the second letter and so on for every letter of the word (recursively). If the last letter of the word has been succesfully selected, the misses still have to be checked, and then the word is printed. That's all. QuackTheDuck does not try to make guesses which give a lot of information, it simply makes the next possible guess. This seemed different from the techniques discussed on the message board that I decided to have a name with an animal very different from a bull or cow. Since the program simply utters some english-like words and since I didn't think that my approach would work well, I named my program QuackTheDuck. After I got it to work, I realized that somehow it worked much better than I had thought, but I didn't change the name. I thought it was funny and at the time, I thought that it should be funny. But I just looked up what Fred says about it in the "Common POTM Rules", and there he says "something clever please!". Oops. I have added something to the algorithm: If a guess gets wordlen-1 hits and there are more than 3 possible words left, I try to make guesses that eliminate more than one possibility. My approach has a big problem: some words can't be found. Example words starting with "ADX", because "X" is not in the list for "AD" at the start of a word. I find all words in my spell dictionary, and many more, but nevertheless I had to extend my algorithm: if no word can be found, fill all ranklists with missing letters. So everything will be found, even "ASDJJFX", but often would need many guesses... but at least it will be found! If time is short, I print the "best" word found up to now. ++ Did QuackTheDuck use any letter frequency tables? Where did you get = them? Yes. I used the rank lists described above. I used my spell dictionary to create them. That's why QuackTheDuck finds all the words in my dictionary (and many more). ++ If I had it to do all over - here's what I would try .... Not sure. ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM ID= EAS? If there will ever be a similar POTM contest, it would be fun to allow access to the dictionary - the dictionary could then be in any language, or be a fantasy or "complete" dictionary. A dictionary with exactly one word would be interesting... In general, a first step, where every program generates its own statistics, and a second step, the guessing or whatever it might be, could be used in other situations. >>>>>>>>>>>>>> _________________________________________________________ DICTIONARY ENTRY: bigmoo 15 ( 5/ 4/ 6) c Susan Adams _________________________________________________________ ================= bigmoo submitted by Susan Adams at buttgirl_69@yahoo.com ================= ++ Please summarize the algorithm used by bigmoo. It was a walking dictionary. Over 63,000 words totaling about 650K. They were divided into lists by word length. The actual code was only about 1% of the total size. It used the idea that Fred so eloquently stated: 2. a most useful "trick" was making good use of the scores of previous guesses to limit the "next" guess ... as follows: a) I'm going to make a guess - there are a large number of possible guesses to choose from b) IF my guess is to be a winner, then it MUST yield the identical scores that my previous guesses yielded. If it doesn't, then it couldn't have been the target word - so perhaps I'm better off with another guess! ++ Did bigmoo use any letter frequency tables? Where did you get them? No way. What use would I have for those? Go big or don't go at all. I got my dictionary from Paul Banta (Bozo). ++ If I had it to do all over - here's what I would try .... 1) I'd put my list of words in a comment. That way, they wouldn't count against me for size. The rules specifically say that comments don't count against you. 2) I'd copy the source code ("BigMoo.c"), comments and all, into my temp file ("TEMP.BigMoo"). I don't think there are any rules against that. 3) I'd read my temp file into a large array of guesses. I'll have to check with Fred on this, but I think making these three changes would yield a legal entry by the letter of the rules. It might bend the spirit of the rules slightly, though. ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? I'm a perpetual student in Cheyenne, Wyoming. Do for fun? I don't know. Girls just wanna have fun! Innermost Secret? That's a tough one. Help me out here, Fred. POTM ideas? None. _________________________________________________________ DICTIONARY ENTRY: Mad_Cow_Cheating 17 ( 7/ 5/ 5) gC Frederic vanderPlancke _________________________________________________________ ================= Mad_Cow_Cheating submitted by Frederic van der Plancke at = frederic.vdplancke@writeme.com ================= ++ Please summarize the algorithm used by Mad_Cow_Cheating The algorithm relies on a heuristic formula to evaluate a possible new guess: if the outcomes of the guess separate the set (of size S) of possible solutions in parts that have size s_1, s_2,... s_N then the value of the guess is given by: bare_value(guess) := - sum_i (s_i) * log(s_i) value(guess) := log(S) + bare_value(guess) / S where bare_value is used in the comparison, whereas value is the really meaningful quantity: it represents the expected quantity of information gained by the guess. (By quantity of information I mean the log of the size of the solution set; I expected this to be proportional to the number of guesses needed. This is roughly true in the beginning of a game, but becomes an underestimation towards the end.) (NB. I didn't take the time to re-check the maths so I might be wrong) When comparing a new guess that is a possible solution to one which is not (or is not known to be one) the above value is adapted to take it into account. (1) The "basic" algorithm (used in earlier versions): It considers all words in the dictionary (even words that cannot be the solution according to the current knowledge), and chooses the one that has the best heuristic value. (2) The "genetic" algorithm (Mad_Cow_Cheating's algorithm): It starts with the whole dictionary (*), or a random sample of it if the dictionary is too big. (I did not really bother with the 10 seconds limit, for quite obvious reasons ;-) but I still did not want to make Fred wait for minutes...) Then it creates new words from old ones, by mutations, permutations and cross-overs. The innovative part is the structure I use for maintaining the genetic pool: a "heap" or "priority queue": that's a binary tree where each node is a better candidate than the two "son" nodes. This way, I can instanteously get the best move so far, and for chosing words for reproduction I can weight them according to their depth in the tree, which is roughly correlated with their goodness. (To increase this correlation I insert the new guesses randomly in the bottom level, before restoring the heap invariant as usual). (*) reduced to words of the right length, this goes without saying... ++ Did Mad_Cow_Cheating use any letter frequency tables? Mad_Cow_Cheating uses a dictionary, so it does not need frequency tables ;-) I made searches and found two sources of special interest (for MOO purposes): UKACD 1.5: = MOBY: (especially the "common" dictionary in "Moby Words"; they have many things including a pronunciation dictionary, the whole of Sheakespeare's work and the Constitution of the U.S.A. !) The page where I found links to these is where you'll find links to a LOT of dictionaries of various kinds... I took the intersection of these UKACD15 and MOBY(common), reduced to the 4- to 9- letters words, as my dictionary. ++ If I had it to do all over - here's what I would try .... I think Mad_Cow_Cheating is quite good as is... If I had the time... I could implement alpha-beta min-max analysis of the end of game (and perhaps of the whole game if resources permit; D.Knuth has done it for the version of MasterMind that has 1296 possible solutions. A complication here is that MasterMind symmetries are lost; and the possible moves are much more numerous...) I noticed the unexpected fact that the "basic" algorithm outperforms the "genetic" algorithm on lengthy words. This means dictionary words have an advantage that my heuristic does not capture, probably because they are more akin to the words to split, but I did not study this question further. (The "basic" algorithm should find most 9-letters words in no more than 4 guesses, even when using a bigger dictionary.) Oh yes, I could weight each dictionary entry according to its probability of being chosen by Fred ;-) ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM = IDEAS? I live in Brussels (Schaerbeek) in Belgium. Look at "SplitMoo" for the other answers... _________________________________________________________ deepmoo 31 ( 7/ 11/ 13) gC Sean Barrett _________________________________________________________ ================= deepmoo submitted by Sean Barrett at buzzard@world.std.com ================= ++ Please summarize the algorithm used by deepmoo A sort-of diary of the development of deepmoo is at http://world.std.com/~buzzard/potm/journal.txt OPENING: Fixed opening book (does not change in response to results), short-circuited if all letters are encountered. Binary encoding is used for most probes for 7+ letter words (consider the meaning of 5 misses for 'OAAEEEE'). The fixed opening book guarantees that every letter (or every letter but one) has been probed by the time the opening book is completed--or else all letters in the word are accounted for if short-circuited, but this cannot happen if there are repeats. (This avoids slow probes through lots of letters during endgame.) MIDGAME AND NORMAL ENDGAME: Depth-first search for strings that match all previous guesses. Heavily "pruned" for speed (i.e. without changing search results). Only matches for which every set of three letters matches some word in a precompiled dictionary are allowed. Search at each letter position goes in order of letter frequencies at that position. (This is not actually using a dictionary, but a set of frequency tables that have only 1-bit frequencies. See below.) LATE ENDGAME: If all words that "match" the trigraph frequencies are rejected, retraverse entire tree without culling bogus trigraph words. (Typically deepmoo has enough information to only take one or two extra guesses in this case if the word is actually English or English-like but not in the original dictionary used.) ++ Did deepmoo use any letter frequency tables? Where did you get them? I compiled a dictionary by getting two different /usr/dict/words, removing all uppercase and non-alphabetic words, running through it by hand looking for anything too weird, and then passing the whole thing through 'spell' on a machine belonging to an acquaintance which was running the same OS version as the potmmaster. I then generated frequencies as follows: For each letter position, compute the relative frequencies of each possible letter. I could have computed this independently for every possible word length, but I didn't bother. For every trigraph position, compute a 1-bit value for every three-letter string, indicating whether that string ever appears in that position. Again, I could have computed this independently for each word length, and it would have made a big difference, but I couldn't afford that much data. I generated an equivalent set of information with quadgraphs (four letter strings), but was unable to compress them sufficiently. The core premise of deepmoo is to use 1-bit frequencies so as to effectively narrow the search space, rather than simply orient it. The final endgame guarantees that even words that don't match the 1-bit frequencies are eventually explored. In other words, multiply together the frequencies of eahc of the trigraphs from each position. In my case, the final answer is either 1 or 0, which can be understood as "much more likely" or "much less likely". All of the former are explored before any of the latter. I have some serious concerns about the fact that there is no real distinction between frequencies and dictionaries, in some sense. For a discussion of this extracted from my journal (the most interesting/revealing thing in the journal, I think): http://world.std.com/~buzzard/potm ++ If I had it to do all over - here's what I would try .... Nothing, because by the end all I was trying to do was compress tables for better narrowing the wordspace (see the aforementioned URL). There may be good alternative approaches, but in *practice* this is a good attack for making the program 'seem to know English', which is of course something all of us playing the game by hand do, but which programs are absolutely ruled-against doing. In some sense, I don't think the chosen POTM problem is an interesting problem space. The problem space "you can have a dictionary, but you should be able to solve even words which aren't real words" is the problem space humans solve. The problem space "the word must be in this dictionary which you can use" is an interesting problem. The problem space "the word could be any string with equal odds" is an interesting problem space. The chosen POTM problem gives a problem space in which the goal is to model the english language without bothering to use a dictionary. As such, it's kind of boring. For me, at least, it became the problem "write a 20K program that produces a wordlist as minimally-larger than some fixed list of words as possible, while still containing all those words". The best I could come up with was 20,000 times as large. (I had one less than 100 times as large, but couldn't get it small enough.) As such, I could watch my program and see how it *sort of* knew English, but not really, and this seemed the main front on which it could be improved. (Indeed, there may simply be entirely different strategies, considering how far deepmoo is from the top of the rankings, but they're not coming to mind here.) With the "less than 100 times as large" quadgraph-based structures, deepmoo generally guessed words that *were* in the original dictionary on the first guess after the opening book. Therefore, at about that point, it would become interesting to pursue shortening the opening book, or looking for other strategies without this dichotomy. ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? I live in Boston. I program computers for fun and work, writing 3d graphics code and games. _________________________________________________________ MooFight 32 ( 9/ 11/ 12) c Kris Kenis _________________________________________________________ ================= MooFight submitted by Kris Kenis at kris.kenis@barco.com ================= ++ Please summarize the algorithm used by MooFight Step 1: make random guesses with all "differrent" letters (wordlength > 6 letters --> 7 different letters, wordlength 4 different letters). Make guesses until all letters of the alphabet are used or until all letters of the word or found (-> only when all the letters of the word are different). Sometimes some letters can be eliminated (misses=wordlength). Step 2: make guesses with "only possible letters". Use 2 different letters for wordlength 6 letters. It is always possible to get the exact lette rs. Check with previous guesses and try to eliminate other letters. Step 3: Make guesses with exact letters using frequency tables. Frequency table contains frequency information for a pair of letters. Keep word with highest score. ++ Did MooFight use any letter frequency tables? Where did you get them? Yes. Somewhere in the potm message board. ++ If I had it to do all over - here's what I would try .... ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? Gent - Belgium - Graphics R&D Engineer _________________________________________________________ NoBullPeacePrize 32 ( 9/ 10/ 13) gC Randy Saint _________________________________________________________ ================= NoBullPeacePrize submitted by Randy Saint at saint@phoenix.net ================= ++ Please summarize the algorithm used by NoBullPeacePrize I borrowed this strategy from the Mastermind optimal algorithm. Try to generate a complete list of all possible solutions (candidates). Limit this list to 2000 words. Test all candidates against each other to see which one splits the list the "best". If time is left over, generate random guesses to see if they can split the list of candidates "better." By "best" and "better", each candidate word is given a score (the sum of the frequencies of each letter pair). The "best" guess would be the one that generates the most even histogram of scores for each possible guess result. This way, the program would lean more towards finding guesses that would split the candidates that were most likely words. I also kept a list of the letters and which locations the letters could not be in. This helped to reduce the amount of possible candidates I had to search through. ++ Did NoBullPeacePrize use any letter frequency tables? Where did you get them? Yes. At first I got them from some lists that people had posted on the original message board. With only a week to go, I ran those lists through the spell program and realized that there were many words in there that would fail. I culled those out of the list and used the remaining ones that were spelled correctly. ++ If I had it to do all over - here's what I would try .... I couldn't figure out any way to generate my list of candidates faster. After a few guesses, the list of candidates was zero and I couldn't search through enough of the possibilities to find even one. Then the program takes a previous guess, changes one letter and uses that. That way I get a little information about the two letters. ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? Live: Houston, Texas. Fun: Programming Contests, and tinkering with my computer. I purchased a used motherboard to use in my second computer, but somehow screwed it up with less than 2 days to go before the contest deadline. Work: Software Development Manager with Lockheed Martin, at NASA, Johnson Space Center. Secret: Ideas: Number guessing vs. the other contestants, was a contest called Knobeln. _________________________________________________________ MooBoo 33 ( 8/ 13/ 12) C Rolandas Gricius _________________________________________________________ Sorry ... No questionairre was returned for MooBoo _________________________________________________________ one_guess 34 ( 8/ 12/ 14) c Ionel Santa _________________________________________________________ ================= one_guess submitted by Ionel Santa at ionels@phobos.cs.unibuc.ro ================= DESCRIPTION OF THE ALGORITHM 1. My algorithm tries at every guess to find a word that is possible to be correct. The words are searched with a backtracking algorithm, taking care of eliminating from the beginning the words that contain invalid combinations of letters, and also the words that are in contradiction with the current knowledge obtained from the previous guesses and answers. If in seven seconds isn't found any word, than the search is continued with a random search. (Hint : if the space of solutions is very small, then the knowledge obtained already makes backtracking a very easy task, and if the space of solutions is very large, a random search (conforming also the knowledge) will be more likely to succeed. My first 3-5 guesses (depending on the word length) were predefined. For example for 7 length I've answered first something like XYYZZZZ, where X Y Z are frequent letters, which permitted me to know for every of these letters if there is or not in the final word. (For example if MISSES = 3, I can take for sure that Z is in the word, and X, Y aren't). I tried to make inferences about : a) letters that can or cannot be on a specified position in the word. b) The minimal and maximal number of letters that can be in the final word for different sets of letters. At this point I make inferences about intersection, difference, etc. LETTER FREQUENCIES 2. I use very little frequencies, only for my first guesses. I took these tables by counting letters in a novel, regardless the length of words (I counted every letter). WHAT I WOULD TRY 3. Maybe I would try to work with schemes like : ['A'-'Z']['O']['R','Y']['Y'] - which means that the first letter is from 'A' to 'Z', the second is 'O', etc. First I would go with ['A'-'Z']['A'-'Z']..., and then after every guess-answer I would split the schemes that I have into others who are conforming this guess-answer. WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? 4. I live in Romania, for the moment I'm at studies in Bucharest (computer science), I like very much literature and music, I cannot make public my secrets, and for the moment I haven't any POTM ideas. Anyhow, I think this contest is very well organized, and I thanks FRED for that. _________________________________________________________ Bozo 35 ( 8/ 11/ 16) c Paul Banta _________________________________________________________ ================= Bozo submitted by Paul Banta at pbanta@yahoo.com ================= ++ Please summarize the algorithm used by Bozo Bozo is about 1/3 statistics and about 2/3 code. Bozo makes some initial "pre-canned" guesses. It then generates guesses based on the following idea. The word that I'm about to guess MIGHT be the word that Fred is thinking of. If it IS the word that Fred is thinking of, I should be able to compare it to all my previous guesses and get the same number of hits and misses that Fred reported. So, that's exactly what I do. I compare the word that I want to guess with all my previous guesses and see if the hits and misses match Fred's hits and misses. If the new guess doesn't match all of Fred's hits and misses, then I don't guess it - it CAN'T be the right word. Instead, I keep looking for a word that matches ALL of Fred's hits and misses. When I find such a word, that becomes my next guess. ++ Did Bozo use any letter frequency tables? Where did you get them? Yes, but they were only minimally helpful. I generated them myself from my 63,000+ word dictionary. A more interesting question is where did I get my dictionary from. I found several dictionaries on the web. I combined 3: a list of english words, a Webster's dictionary, and the English half of an English/Spanish dictionary. I then threw out all the words that were not 4-9 characters long. That gave me a list about 100,000 words. I then ran this list through a Sun spell program (special thanks to Ken Been in NY for doing this for me), and threw out the words that it didn't like. This gave me a list of about 63,000 words. What was more helpful than letter statistics were other self-generated statistics that I use in guessing words. I have a table of which triple-letter combinations occur and don't occur in my dictionary. For example, "QUI" does occur in my dictionary, but "BBB" doesn't. This helps me to never even consider guessing words with "BBB" in them. There are 17,576 possible three letter combinations (26^3). I don't remember the specifics, but I think only about 1/3 of them occurred in my dictionary. So, 2/3 of them didn't occur. I also stored and used statistics for which sequences of 4 consecutive consonants, 5 consecutive consonants, and 6 consecutive consonants occur in my dictionary. ++ If I had it to do all over - here's what I would try .... I had a somewhat different algorithm idea late in the contest, but didn't have time to implement it. Phase 1: You make an initial guess, say "ABCDEF" (6-letter word). Fred gives you back hits and misses. Phase 2: Your next guess uses only those 6 letters such that if it WAS the word that Fred is thinking of, it would get the same number of hits and misses as Fred reported. You keep making guesses this way until you can't make any more - any guess that you make wouldn't have the same number of hits and misses with all previous guesses that Fred reported. When you run out of guesses, you go back to phase 1 and guess another 6 letters, say "GHIJKL" - now you have 12 letters to work with. I think this algorithm is better than my algorithm, but I'm not sure how much better. My current algorithm averages about 12.3 guesses per word which aint too shabby. ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? Live: Colorado Springs, Colorado, USA Fun: Hiking, Fishing, Camping, Backpacking, Volleyball, Breaking perfectly good automobiles under the guise of "fixing" them, POTM. Work: An old computer programmer trying to learn new tricks (java). Innermost Secret: I have no more inner-most secret. I'm out of the closet (so to speak). I'm "Mr. Bozo". POTM Ideas: None that I haven't already sent to Fred. _________________________________________________________ zigzag 35 ( 7/ 13/ 15) c Jaco Cronje _________________________________________________________ ================= zigzag submitted by Jaco Cronje at s9805027@student.up.ac.za ================= ++ Please summarize the algorithm used by zigzag For words of length 4-6 : For the first 4 picks, output something like ABCD then EFGH then IJKL then MNOP. After that, do a DFS to find a solution that is legal. A "legal" solution is one that could be a possible answer in comparison with the previous picks. If the time is up and no "legal" solution were found, just output the last combination of letters tested. For words of length 7-9: Find out exactly which letters are used in the word. Maximum of 9 picks required. Pick something in the form of : ABBCCCC ( For 7 letter words ) Then, check the result of the misses. If 0 miss, then A,B and C is in word. If 1 miss, then B,C is in word and A not. if 2 miss, then A,C is in word and B not. if 3 miss, then C is in word and not A or B. if 4 miss, then A,B is in word and C not. if 5 miss, then B is in word and not A or C. if 6 miss, then A is in word and not B or C. if 7 miss, then A,B and C is not in the word. ( The same thing can be done for words of length 8 and 9) When the (amount of letters found = letters in word) and (picks < 9) then this process is stopped. Now, again a DFS is done, only on "legal" solutions and only the correct letters are used. ++ Did zigzag use any letter frequency tables? Where did you get them? Yes, compute them from the UNIX dictonary. Used to optimize the DFS. Searching in the most possible solution space. ++ If I had it to do all over - here's what I would try .... Maybe a combination of a GA and a DFS. My DFS can be optimized more I think. Better combinations to figure out what letters are used and how many of each letter are used. ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? Live in : South-Africa, in Gauteng near Pretoria. For Fun : Visit my girlfriend, play Quake, program games. For work: Student at the University of Pretoria, doing Computer Science. I am a 2nd year student now. Secret : Thats for you to find out. Ideas : * Playing a game against other people. * Programming a robot who fights against the other bots. * Quake like deathmatch in a 2D arena. * or Maybe a Chesslike board game. * or Chinese Checkers... * or maybe Capture the Flag !! _________________________________________________________ scharade 36 ( 10/ 12/ 14) gC Michael Strauch _________________________________________________________ ================= scharade submitted by Michael Strauch at michael.strauch@cityweb.de ================= ++ Please summarize the algorithm used by scharade In general, "scharade" uses the following strategy: 1) The first guesses test the complete alphabet, using a hard coded list for each word length from 4 to 9 (if some of these first guesses show up that all characters of the word are identified, scharade continues immediately with step 2) 2) The next guesses are calculated by some kind of exhaustive search to find a word that could be a possible solution. This might not be the optimal strategy, it could be more efficient to use some guesses that are not possible any more, but I had no good idea how to accomplish this. 3) Some times, ten seconds are not enough for step 2 - if scharade exceeds a running time of 9 seconds, it stops immediately and chooses a random guess. I think step 2 needs some more explanation. First, I set up a straight-forward backtracking algorithm for this task - but though I optimized it a lot, scharade ran too often into step 3, especially for words of length 8 or 9. Than I tried to make use of the fact that Fred said that he will only use "spell-checking" words (see below). That made the situation a little bit better, and decreased the average number of guesses. Finally, I added some ingredient called "Ordered Binary Decision Diagram" (OBDD). These kind of data structure allows scharade to calculate quickly all subsets of the alphabet that are a possible candidate for Fred's hidden word, according to the information of the previous guesses and their number of "misses". Then, the backtracking algorithm runs once for every possible subset, now testing only words with characters from this subset, to find one that meet the restrictions coming from the previous "hit" results. This gaves me the speed improvement I was looking for - step 3 was only reached in very rare cases (on my machine, and Fred's machine seems to be comparable). An OBDD is some kind of finite state machine, that represents a binary function. Scharade uses OBDDs to calculate the function f(x1,...,x26), where x1,...x26 are binary input variables representing one subset of the alphabet, and where f(...)=1 if and only if (x1,...,x26) represents a subset that meets all of those "miss restrictions". As you might have noticed, the number of "misses" of a guess do not give you any information about the order of the letters you are looking for, it tells you only which letters are used. (Instead of using OBDDs, one could make a list of all ~5 million possible subsets of the alphabet of size >> Yeah, I allowed it ... =Fred ++ If I had it to do all over - here's what I would try .... - try to develop a better theory of how to find "good" guesses - try to encode more statistical information into 25Kb of source code - perhaps, rewrite the thing in Perl, just for fun (well, that "10 secs per guess" rule gives me some headache about this) ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? I live in germany, do POTM contests for fun, do software engineering for work, won't tell my innermost secret -- it wont be a secret any more ;-) --, and wonder each time where Fred gets all those genious ideas for his contests from. _________________________________________________________ steak 37 ( 6/ 9/ 22) gc Ben Angrim _________________________________________________________ ================= steak submitted by Ben Nye at angrim@montana.com ================= ++ Please summarize the algorithm used by steak main idea of interest was to estimate the reduction in search space resulting from a guess by creating a large sample of strings that would be valid based on the previous guesses and the letter triples and then picking which one partitioned that sample search space the best. This worked very well provided that I was able to find a large sample. In the final test(final word) my program ran into time trouble and created very small sample search space which resulted in BAD partitioning. The letter frequency tables were used to weight each entry in the sample space based on how likely its letter pairs were, 2 partitions of (XZXZ,XXXZ,ZZZZ,ZOOX),(ZOOM) would be scored better than 3 partitions (ZOOM,XZXZ,XXXZ),(ZZZZ),(ZOOX) for instance because ZOOM would be weighted so high. ++ Did steak use any letter frequency tables? Where did you get them? yes, 2 tables, 1 was the frequency of pairs of letters, the other was the possibility of triples of letters. This resulted in very english-like words, some of them very funny :-) I got my letter frequencies from the dictionary that came with the hangman game on my linux system. ++ If I had it to do all over - here's what I would try .... start coding 1 day earlier, I thought about the problem for months, but didn't start coding till Thursday evening.. first version that worked at all was around 4PM Saturday.. I knew that it was having speed troubles with (some) 9 letter words when I sent it in but was out of time. ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? Still living in Missoula, MT do for fun: program, play Starcraft, once it gets a little warmer hike Work: Unix/C/X11 programmer Secret: a secret shared is no secret _________________________________________________________ EISAR 38 ( 10/ 15/ 13) c Alex Popiel _________________________________________________________ ================= EISAR submitted by Alex Popiel at popiel@snugharbor.com ================= ++ Please summarize the algorithm used by EISAR EISAR's logic is divided into two parts: a deductive engine and a guess generator. Most of my work went into the deductive engine, since that's what I understood the best. The deductive engine worked to determine for each letter: the maximum number of instances of it in the word, the minimum number of instances of it in the word, and whether of not it was in each of the possible positions. After each guess, it would cycle through all of the guess results so far, deriving as much information as it could from the guesses. Many different rules were used to extract information from the guess results... this is where about half of the 25K program lived. Based on the number of misses, it would figure out which letters used in the guess were present in the target word (raising the minimum count to 1) (it would use guesses with repeated letters in proportions like 4 E's, 2 S's, and 1 R, so that the miss count was unambiguous). Based on the number of hits, it would fill in the position grid, and change the minimum counts. Based on unknown positions and other minima, it would set the maximum count (and also would set the maximum count based on hits if all unknowns were in a single guess). By iterating through this repeatedly (and also using knowledge like there was exactly one letter in each position), the deductive engine would fill in positively known data at a fairly brisk pace. The guessing engine was a different beast entirely. It ran in two phases: first, figure out what letters were in the word (and how many of each), and then second sort out what positions they were in. To figure out what letters were present, it would generate guesses of two or three letters repeated in unambiguous proportions, as mentioned above, starting with the most likely occuring letters, and continuing until all letters were located. Thrown in here would be guesses just to count occurances of a known present letter, if the probability of that letter being repeated was higher than the probability that the next few unknown letters would be present. (Likelyhoods were determined via composites of raw frequency charts and frequency of bigrams with known present letters.) Also, during this phase, letters would be positioned in the _least_ likely positions to try and keep the hit count low, so that the negative info in the position matrix filled in faster. The second phase of the guessing began when the minimum and maximum counts were equal to each other for all letters (that is, all the letters were known to be present or not, and in what quantity). EISAR would then begin to alternate a try for the real word (based on letter-in-position frequencies, bigram frequencies, and consonant run-lengths) and position determination for a single letter. Occasionally (as with DUCK), the first guess after figuring out what letters were present would be the actual word. After figuring out positively where various letters were, it was generally pretty good about getting English-like guesses. The submission that actually ran in the finals is a rather old version of EISAR. I made a number of "improvements" to the deductive engine, generalizing the rules it used and allowing it to use combinatorial set theory to figure stuff out... but it ran very much slower, without consistent improvement to the guess count. (I have lots and lots of histograms of running various versions of EISAR against my entire dictionary, if anyone's curious to see them.) One last note: EISAR is guaranteed to find the solution in less than 100 guesses... it will know the letters present in no more than about 20 guesses, and it can easily find the positions in the remaining 80 guesses. This is true regardless of the english-ness of the word to be guessed. ++ Did EISAR use any letter frequency tables? Where did you get them? Yes, EISAR used letter frequency tables of many forms. I had separate frequency tables for each length of word for each of "Does this letter occur in the word?" and "Does this letter occur more than once in the word?". I also had a bigram frequency table (which was not length differentiated). All of this was encoded in my program to reduce source size. I generated all my tables myself based on a fairly nice dictionary that I got from reading the message board. Since the source of the final test words was unknown, I did not worry overmuch about whether my dictionary matched the word distribution of the final choices. ++ If I had it to do all over - here's what I would try .... I'm really not sure what I would do. I probably spent too much time on the deductive engine, and not enough on the guess generator... but I was too interested in provably finding the answer in time, instead of finding english-like words quickly. I would have liked to differentiate my guessing algorithm by word length (for instance, when finding which letters were present in length four words, it would have been beneficial to quickly eliminate the low-frequency letters in groups of four with the first couple guesses). For the (missing?) Hints and Tricks section: Once again, I never used malloc(). Didn't want to waste the time. Also, all my state information was in one big struct (with various arrays and such) which could be shifted to and from disk easily with a single read or write call. ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? I live in Mountain View, CA, USA. I code for fun, I code for work, code is my life. (Well, okay, I also play games like scrabble and othello and such with various friends, and go out to the occasional movie, but don't tell anyone!) My innermost secret? Umm... I'm obsessive-compulsive. Wait, everybody knows that... POTM ideas? Given a set of shapes, how many different ways can you fit them in a box? Given a list of words, determine which are least likely to be confusable? Maybe a head-to-head game where you have three artillery pieces, and each turn you can move them a certain distance, or fire them, but not both, and the last one with a gun left wins? - Alex _________________________________________________________ SimpleMind 39 ( 12/ 14/ 13) c Edmond Bernardy _________________________________________________________ ================= SimpleMind submitted by Edmond Bernardy at ebernardy@lucent.com ================= A History of SimpleMind, by ebernardy@lucent.com A good strategy for solving a problem like mastermind seemed to be to apply the concepts of information theory, and to try to maximize at each step the information obtained in response to a proposed word. The two numbers (hits and misses) can be considered as a message, and if the probabilities of the different possible messages are known (or can be computed/estimated), the information content of the received message can be computed. (The information of a message is computed as -SUM (p_i log(p_i)), where p_i is the probability of a message instance, and the sum is taken over all possible messages). The strategy is simple to state, but Fred took care to make the search space so large - up to 26 exp 9, or about 5.4E12 at the beginning - that counting the numbers of possible answers is not a realistic proposition. The idea was therefore to generate at each step a (representative) sample of possible words, and to determine in that restricted sample space the best proposal. It was also decided to use only words which were consistent with previous answers, although in some cases this is not the best choice. An efficient method for generating possible words was obtained by creating two arrays, which are updated at each trial and saved in the temporary file. One array contains the possible letter sets of already tried letters, the other array contains the possible letters for each letter position. Possible letter sets are coded as bit fields, 26 bits fitting nicely into a 4 byte integer. Sounds simple, but it took more time than expected for debugging and = making it efficient, eliminating duplicates or impossible letter combinations. This strategy seemed to work OK for short words, but for words of more than 6 letters it became obvious that letter and di-(tri?)graph = frequencies must be taken into account. At this point, time was running short. Single letter frequencies and frequencies of preceding and following letters in digraphs were obtained for all words of at least 4 letters from Webster's dictionary. The present version of SimpleMind generates a word by first chosing among the allowed letter sets the shortest set with the highest letter frequency, and putting the letters in allowed positions. The word is completed by adding the most probable letters, taking into account already fixed preceding or following letters. Weak spots are: - different letter frequencies should be used for the first and last letters of a word: digraphs like 'rd' can appear at the end of a word, but not at the beginning - in the first trials more weight should be given to using untried = letters and avoiding (?) repeated letters Conclusion: great fun, but next time START EARLIER! _________________________________________________________ Nero_the_Horse 42 ( 12/ 16/ 14) PERL Mike Liddell _________________________________________________________ ================= Nero_the_Horse submitted by Mike Liddell at mike_liddell@hotmail.com ================= ++ Please summarize the algorithm used by Nero_the_Horse constraint solver---use previous guesses to determine all possible patterns, using patterns. eg ABxxE would allow ABCDE (all letter restrictions are observed..eg third letter might be A..P,Q,S..Z if we know that R isnt allowed. This bit works fine, but I did very little work on using this information--I don't make particularly good guesses and my initial guesses were made semi-randomly ++ Did Nero_the_Horse use any letter frequency tables? Where did you get them? nup...it is not english-specific (dammit) ++ If I had it to do all over - here's what I would try .... same approach, possibly in C as speed is a concern. Spend time analysing the second part of the problem (choosing best guess once full info is determined) ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? Melbourne, Aus. wrote this potm entry in London, currently in Philadelphia. Disclaimer: I had to stop working on this potm at the end of March due to travel etc.... so I have an excuse! John Fitz: no entry?!?!? Thanks once again, Fred, for a fun comp. _________________________________________________________ not_bullish 45 ( 10/ 11/ 24) PERL Eric Prestemon _________________________________________________________ ================= not_bullish submitted by Eric Prestemon at ecp@io.com ================= ++ Please summarize the algorithm used by not_bullish Dumb initial guesses. In the second phase, after sorting the previous guesses by hits and then misses, it randomly fiddled its current guess to match as many previous guesses as possible. Too slow on long words, so it would short-circuit out. ++ Did not_bullish use any letter frequency tables? Where did you get them? The program had no knowledge of the English language except the 26 letters. Not even its dumb guesses were optimized. ++ If I had it to do all over - here's what I would try .... I did do it all over, with a best-first search algorithm that worked much better on most words, but was prohibitively slow. Then I lost interest. ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? I program at an internet startup in silicon valley. However, I still manage to lead a relatively sane life. _________________________________________________________ TOEFL 46 ( 12/ 13/ 21) c Sergey Kondrachshenko _________________________________________________________ ================= TOEFL submitted by Sergey Kondrachshenko at SKondras@iacgsz.pavlodar.kz ================= ++ Please summarize the algorithm used by TOEFL ++ Did TOEFL use any letter frequency tables? Where did you get them? In algorithm the following statistical data are used: - the ranglist of symbols of the alphabet in decreasing order frequencies of use of a symbol; - the ranglist most frequently of met pairs symbols (in decreasing order); - List of pairs symbols which have been not met in the processed list of words of the English language; - the ranglist most frequently of met groups of three symbols (in decreasing order). First three lists are received for each possible(probable) size of a word, and last general(common). For reception of the given statistical information the words from the followin g source were analysed: ftp://ftp.simtel.net/pub/simtelnet/msdos/lingstcs/ files words1.zip, words2.zip, words3.zip, words4.zip. Algorithm consists of two parts: 1. Generation of a new sequence of symbols as the decision. For check of presence of a symbol in a word the circuit 4444221 is used depending on length of a word with necessary lengthening of the longest component (for example 22221 or 44444221), and after presence of an absent symbol strictly agrees to the circuit. The procedure for test of symbols according to a rank from the first list, and in subsequent with the account already of available symbols and pairs of symbols from the second list. When there is a symbol which presence in the certain position requires check, it is carried out in parallel with check of the next portion of symbols. Choice of a symbol, for which the check will be carried out occurs in the following order: at first group from three symbols, then pair of symbols, then other variants in view of absent pairs (undertakes symbol at which less possible(probable) positions and is checked position at which greatest number of possible(probable) variants), in last turn simply possible variant (similarly previous). For N=7 the circuit 4444221 is used if there are no symbols for check, the circuit 221 is otherwise used. At check of presence of a symbol in a word such sequence is whenever possible generated, that at the analysis there was an opportunity in addition (in the certain conditions) unequivocally to define(determine) a position, borrowed(occupied) by a symbol, (for example for 1 it is a position: for which symbol yet is not determined, and for 2 it is one position already engaged and one free). 2. Analysis of results of the previous decision. On MISS and used circuit the symbols, present in a word, and if it unequivocally probably also position, borrowed(occupied) by them, (various combinations HITS and MISS are defined(determined) in view of the offered decision). The additional analysis will be carried out(spent): a) All unequivocally certain positions should not be checked at the present symbols). b) If there is a symbol at which there is no certain position and only one possible(probable) unverified position: that she(it) is precisely determined also performance of item a). c) If all symbols are checked up on presence in a word and there is a position Which can be borrowed(occupied) only by(with) one of symbols: that this position is precisely determined and the items a) and b) are carried out. If the sum of the certain positions and symbols not having any position is equal N, the stayed symbols on presence are not checked. If a symbol for one position precisely is not determined: that in decision are used all guessed symbols and next checked on position a symbol (if he will coincide that complete answer) at once will be received. ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? Where do you live : Republik of Kazakhstan , city Pavlodar Do for fun : programming, logical game Do for work : programmer, temporarily unemployed _________________________________________________________ Leap_Year_Baby 50 ( 12/ 20/ 18) gc Yoichi Kono _________________________________________________________ ================= Leap_Year_Baby submitted by Yoichi Kono at kono98@geocities.co.jp ================= ++ Please summarize the algorithm used by Leap_Year_Baby In first several steps, try to figure out which character is used exactly. After these steps, just try to figure out positions from the history. (Anyway it is not so smart) ++ Did Leap_Year_Baby use any letter frequency tables? Where did you get them? Yes. I got them from English-Japanese online dictionary:-) ++ If I had it to do all over - here's what I would try .... ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? _________________________________________________________ Cowculator 55 ( 19/ 18/ 18) JAVA Ted Alper _________________________________________________________ ================= Cowculator submitted by Ted Alper at alper@turing.stanford.edu ================= ++ Please summarize the algorithm used by Cowculator optimized for speed of writing the program, not quality of results... I thought about the problem for a little while back in February, but never got around to coding. The day before the deadline, I had the chance to write something, so I tried to put the gist of the idea in, but didn't have time to do it all. The goal was: (A) use initial guesses to get precise information about which letters occur -- for example, if you guess ABBCCCC, then the number of misses uniquely determines which letters among "A" "B" and "C" occur in answer. Of course, you may also get some information about letter placement. (B) once the number of possible letter-sequences that could be the answer becomes manageable, generate the collection of them and now make each guess one of the possible answers which does the best job of winnowing the set of possible answers in the worst case. ++ Did Cowculator use any letter frequency tables? Where did you get them? NO, I had intended it to do so, but I didn't have time to implement that. Heck, I even just used alphabetical order to make my initial guesses ++ If I had it to do all over - here's what I would try .... I'd start coding more than 24 hours before the deadline! I'd use letter frequency information to improve (A) and (B) -- guess most common letters first, weight the possible answers by their likelihood of being a word, etc. I'd improve my measure of using the informatoin from hits and misses, and of generating valid words. ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? Palo Alto, CA. No fun; I wish I had more of a programming job, lately I spend most of my work-time tutoring math students over the internet. _________________________________________________________ Revolution 58 ( 11/ 24/ 23) C Dave Krofssik _________________________________________________________ ================= Revolution submitted by Dave Krofssik davek@soft-tek.com ================= ++ Please summarize the algorithm used by Revolution Save any previous match and what was hit and missed for later comparisons. If the word was all misses, eliminate all letters in that word from the alphabet. Output the letters in the alphabet (in frequency order) until all the letters have been used at least once. Go through all combinations of letters that have not been eliminated, searching for one that is compatible with the responses that were previously recorded (i.e. a guess that would produce all the previous responses if it were the correct one). If a word was not found before time expired, output a word that contains only the next letter in the alphabet that has not been absolutely included or eliminated. > ++ Did Revolution use any letter frequency tables? Where did you > get them? I did use a table I found on a website, but I did not record which site. > ++ If I had it to do all over - here's what I would try .... Speed up the processing, so I could handle more combinations in the time allotted. _________________________________________________________ MOOguess 59 ( 19/ 22/ 18) c Arkady Zhikharev _________________________________________________________ ================= MOOguess submitted by Arkady Zhikharev at arzhik@yorp.yaroslavl.su ================= ++ Please summarize the algorithm used by MOOguess I think that it isn`t interesting to anybody (look finals :( ) ++ Did MOOguess use any letter frequency tables? Where did you get them? Look above ++ If I had it to do all over - here's what I would try .... I would think slightly... :) Unfortunately I have not any free time ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? Russia. For fun I solved the last POTM problem (POTM is GOOD IDEA and GOOD PROGRAMMERS TEST!). My work - to develop SCADA (Supervisory Control and Data Acquisition) program. Secret ... shhhhhh - is a secret ;) If any idea will visit me, of course, will let you know. _________________________________________________________ MOOtPoint 61 ( 13/ 24/ 24) c Shawn Smith _________________________________________________________ ================= MOOtPoint submitted by Shawn Smith at smiths2zzyy@worldnet.att.net ================= ++ Please summarize the algorithm used by MOOtPoint There's nothing special about it. It has two phases: the first phase is used to find all the letters that exist in the word, and the second phase is used to find out where each letter goes. It doesn't recognize all the history in the game that would cut down the number of guesses, so it is not very optimal. The ONLY somewhat tricky part of the first phase is that it uses the number of misses with a different number of each letter to tell what letters are and are not in the target word. ++ Did MOOtPoint use any letter frequency tables? Where did you get them? It used letter frequency tables in the sense that the first phase selects letters for its guesses from an examination of letter frequencies that I did on a dictionary that I downloaded from the internet; I can't remember where. Here is a list of the initial guesses: static int NLetter[10] = {0,0,0,0,2,2,2,3,3,3}; static int Dist[10][3] = {{0,0,0},{0,0,0},{0,0,0},{0,0,0},{1,3,0}, {2,3,0},{2,4,0},{1,2,4},{1,2,5},{1,3,5}}; static char const *IniGuesses[6][14] = { {"AAAE","OOOS","RRLR","NINN","DDDT","PUPP","MBBB", "CHHH","GKKK","WYYY","FVVV","JZZZ","QQQX",0}, {"SSSEE","RAARR","LOOLL","TIITT","DDNND","CUUCU","PHHPH", "MYMYY","BGBGG","FFFKK","WVWVV","ZZXZX","JQJQQ",0}, {"SESSES","RAARRR","NININN","OOLOLO","DDDTDT","CUUCCC","MGMGGG", "PHHPHH","BYBYYY","FFFKKF","WVWVVV","JXJXXX","QQZZQQ",0}, {"SRRRRES","NANIINN","OOOTLLO","CGGCGGD","PUMPMMM", "BHYBYYY","FWWKKWW","XXVZZXX","JQJJQQQ",0,0,0,0,0}, {"SIIIIIES","NARNNNRN","LOOTLLLL","CGCGGGGD","MUPMPPPP", "BHFFFBFF","VKVKKKKY","WXXZZZZZ","QQQJJJJJ",0,0,0,0,0}, {"SSSSIIIES","NARNNRNRN","LOLLOTOLL","CUDDUUUUD","PMPPMMMMG", "BHBYYYBYY","FWVVWVWWW","XXXXKZZZX","QQQQJJJJJ",0,0,0,0,0} }; static char const LetterOrder[6][28] = { "EASOLRINTDUPMBCHGKWYFVJZXQ\0", "ESAROLITNDCUPHMYBGKFWVXZJQ\0", "ESARINLOTDUCMGPHBYKFWVJXZQ\0", "ESRAINTLODCGUPMHBYFKWVZXJJQ", "ESIARNTOLDCGUMPHBFYVKWXZQQJ", "EISARNTOLCDUGPMHBYFVWKZXQQJ" }; ++ If I had it to do all over - here's what I would try .... A few things: 1. I was hoping to have a set of trees (1 for each word size) that would be used to select the next word that MOOtPoint would guess, given the current selections that it made and the results obtained. 2. Instead of trying to find where each letter goes individually, guessing possible solutions and examining the results for the next guess. For example: If I guessed PINETREES for PINEAPPLE, the result would have been 4 hits and 3 misses. I would try to find the "next" word that would have 4 letters in the same position, and three letters that were not in PINETREES. 3. Use neural nets to recognize English words, in order to cut down the number of possible guesses that would be available for #2. There would have been one net for each size of word, and the inputs to each net would have been the letters, probably encoded in some way, and the output would have been 0 or 1, for answering the question, "Is this an english word?" ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? Las Vegas, Nevada. The POTM, bicycle riding (because I'm too cheap to have car), and watch a few programs on television. Also, finding a completely optiminal solution for the test words using my algorithm in the hopes that I could win a "Worst abuse of the rules" consolation prize. Maintain and modify part of the IR (information retrieval) system used on the Yucca Mountain Site Characterization Project, for TRW Environmental Safety Systems. See my response on inverse boggle POTM. Ditto. _________________________________________________________ Daily_Farmer 67 ( 11/ 33/ 23) C Gabriel Gambetta _________________________________________________________ ================= Daily_Farmer submitted by Gabriel Gambetta at ggambett@ucu.edu.uy ================= ++ Please summarize the algorithm used by Daily_Farmer Quite straightforward. I'll use the words DOGS for example. I call LETTER a character that is not a MISS nor a HIT (ie., right letter in a wrong place) 1) Isolate the "groups" where the HITS and LETTERS are. DF does that checking for the alphabet, in this case ABCD EFGH IJKL MNOP QRST. ABCD 0 3 (1L) EFGH 1 3 (1H) IJKL 0 4 (nothing) MNOP (1L) QRST (1L) ---> Got all the letters, stop here It saves the useful groups (ABCD 1L, EFGH 1H, MNOP 1L, QRST 1L). It also saves a "empty" letter, in this case I (the first letter in the first useless group). 2) Get the HITS. Does this testing each letter in the group in its own place. EIII 0 4 IFII 0 4 IIGI 1 3 => Current word = "__G_" 3) For each group containing letters, check each letter in a valid place (a place where it was not tested and is not used). This is for finding the letters and also finding a hit in a lucky guess. It puts current word characters for the case in which the last letter is found, so the final guess is the "full guess" IAGI 0 4 BIGI 0 4 CIGI 0 4 => No need to test DIII, it MUST be the letter in this group. So, locate the letter by testing it in the remaining places. DIGI 2 2 => Current word = "D_G_" Continue this process until find the word. 4) (optional) If all groups were tested and the game continues (the word was not guessed), there were repeated letters. So begin testing each already used letter in any unused space. ++++++++ Did Daily_Farmer use any letter frequency tables? Yes, actually it checked against EASR, INTO... I didn't remember where did I get these. I did my own reserch on a word list, but I don't remember which frequency chart I used at last. The one I started using was posted on the Bulletin Board. The other language-specific optimization I used is to arrange the valid positions for testing words such that a test that would imply two consecutive vowels go later. ++++++++ If I had it to do all over - here's what I would try .... I stopped coding and thinking about the problem after my program compiled for the first time. It was before easter, I think. I had a couple ideas anyway, like testing hits and letters from different groups in the same test. I don't think this could help much. Other idea was dividing the tests in halfs or thirds, so if ABCD had a hit, then test AB with the second half of other group containing a hit, and so on... + WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? Montevideo, Uruguay, somewhere lost in South America Play soccer, write programs, study Computer Science! Write administrative-like programs for my company Actually I'm The Brain, and this is just a small part of my newest plan to take over the world... Something like an Improved Showdown, maybe a tank war, with terrain attributes, line of sight... I don't know... _________________________________________________________ oxmoomoo 73 ( 17/ 22/ 34) C Kwok Ngai _________________________________________________________ ================= oxmoomoo submitted by Kwok Ngai at kwok@esun.att.com ================= ++ Please summarize the algorithm used by oxmoomoo Fill all positions with a list of letter. When there are GOOD letters exists on the list Loop through each poitions to determine the number of GOOD letter For each GOOD letter Loop through each position to determine the GOOD spot. Continue the above process until all GOOD letter are found. It also keeps track of all possible GOOD spots and absolute BAD spots to speed up or eliminte searching. ++ Did oxmoomoo use any letter frequency tables? Where did you get them? No ++ If I had it to do all over - here's what I would try .... Do not know yet. ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? I live in NJ and I am a programmer. _________________________________________________________ sieve 73 ( 15/ 29/ 29) JAVA Aivars Zogla _________________________________________________________ ================= sieve submitted by Aivars Zogla at fix.lv!uvsk@kcig2.att.att.com ================= ++ Please summarize the algorithm used by sieve Two stages: 1) determining of letters 2) determining of correct positions At first stage for words, made from 4-6 letters, each guess looked like ABBB[BB], for words longer than 6: ABBCCCC[CC]. If it appears to be doubles, or triples, or so, next some guesses were of format AAAA[AAAAA] - single letter used. Second stage - each guess took 2 letters and additional letter, which isn't in the word at all: ABXX[XXXXX]. Depending on misses, decision was taken to make next approaching guess for determining correct position for at least one letter. If 2 misses' guess, first letter (out of 2) was queued for further examining. Here's "sieve's" fight for PINEAPPLE: NUM GUESS HITS MISSES NUM GUESS HITS MISSES === ===== ==== ====== === ===== ==== ====== 1 AEEIIIIII 0 0 14 EAFFFFFFF 0 7 2 OUUPPPPPP 2 3 15 AIFFFFFFF 1 7 3 RSSMMMMMM 0 9 16 AFFFFFFFF 0 8 4 NGGWWWWWW 0 8 17 PFNFFFFFF 2 7 5 KLLCCCCCC 0 7 18 FFFLEFFFF 0 7 6 DHHVVVVVV 0 9 19 FFFEPFFFF 1 7 7 TQQJJJJJJ 0 9 20 FFFEFFFFF 1 8 8 YBBXXXXXX 0 9 21 FFFFPEFFF 0 7 9 FZZYYYYYY 0 9 22 FFFFEAFFF 0 7 10 EEEEEEEEE 2 0 23 FFFFALFFF 1 7 11 AAAAAAAAA 1 0 24 FFFFAFFFF 1 8 12 IIIIIIIII 1 0 25 FFFFFPPFF 2 7 13 PPPPPPPPP 3 0 26 FFFFFFFEL 0 7 27 PINEAPPLE 9 0 Most interesting at this POTM was temporary file. For sach an algorithm it was enough to store at most about 30 bytes of "knowledge about past". It's even possible to restore all sequence from this knowledge at any particular point of game. ++ Did sieve use any letter frequency tables? Where did you get them? I did. Vovels first, consonants after. What a frequency could appear in 3 final words? ++ If I had it to do all over - here's what I would try .... I get some new and fresh ideas for myself out of this POTM. I wouldn't like to forget them. ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? Ugale, Latvia. For fun I'm thinking what I could do for fun. For work - still trying to pour knowledge about algorithms into high-school students' heads. Secret is very simple - there already exists perfect pictures in nature, we just have to make them visible. Fred's ideas for POTM are OK! _________________________________________________________ Abacus 78 ( 17/ 28/ 33) c Igor Volkov _________________________________________________________ ================= Abacus submitted by Igor Volkov at i-volkov@bigfoot.com ================= ++ Please summarize the algorithm used by Abacus Using letter frequency table I tryed to eliminate unused symbols, then found duplicates. ++ Did Abacus use any letter frequency tables? Where did you get them? It used frequency table of letters. I got it from the list (approx. 100000) English words. ++ If I had it to do all over - here's what I would try .... I would try to find more elegant solution. ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? Belarus, Minsk. I work at Software Engineering Dept., BSU. _________________________________________________________ Max_Power 78 ( 20/ 23/ 35) gC Ed Mandy _________________________________________________________ ================= Max_Power submitted by Ed Mandy at erm@moby.atcorp.com ================= ++ Please summarize the algorithm used by Max_Power It's a brute-force approach that uses some "smarts" to cut down on the iterations necessary. It should always get any (legal) word in the allotted time/iterations. The algorithm used it based on the idea that if you had a N-letter word, you could try every letter of the alphabet in order to figure out which letters are in the word. This pass has guesses that look like: AAAA BBBB CCCC Then, once you are done with that pass, you know which letters are in and which letters are out. You can then start placing the letters in their respective positions. For example, if you knew that the letters A, C, S, and T were in the word, then you could go through iterations that looked like: ABBB BABB (now we know that A is in the 2nd space) CBBB (now we know that C is in the 1st space) BBSB BBBS (now we know that the string is CATS) I also do some combining of the 2 passes just to cut down on iterations. ++ Did Max_Power use any letter frequency tables? Where did you get them? Yes, well, kind of. It uses a rearanged alphabet so that the most likely letters are tried first. I got the frequency table from a WWW site. ++ If I had it to do all over - here's what I would try .... I would finish off my approach. I think it's a good solid approach, but I did not optimize it enough. ++ WHERE DO YOU LIVE? In a suburb of Minneapolis, MN. DO FOR FUN? Play Magic (and now Pokemon), rollerblade, play computer games, play with computer software and hardware, play with my fiance, etc. DO FOR WORK? I program for Triticom (www.triticom.com). _________________________________________________________ newbie 81 ( 17/ 30/ 34) gc Arlie Stephens _________________________________________________________ ================= newbie submitted by Arlie Stephens at arlie@netcom.com ================= ++ Please summarize the algorithm used by newbie Newbie was very much unfinished, due to not spending sufficient time on it... I'm embarassed at the simple-minded algorithm... Very simpleminded: - scan letters in bunches to determine whether or not they were in the word (Mostly winnowing "not present" letters; some info saved for later) - rescan letters remaining individually to determine presence/absence and number of instances present - try each present letter in all possible positions (one guess per candidate position, using known-non-present fill) ++ Did newbie use any letter frequency tables? Where did you get them? Only the basic "ETAIONSHR" is most frequent in english, in that order, from memory. (I used to solve ciphers for fun, and learned that list at that time.) ++ If I had it to do all over - here's what I would try .... First and foremost, spend more time developing it. Second, read the rules more carefully. I had a better algorithm for determining letter presence that I didn't use because I got it from another entrant, and didn't realize that it was legal to use his algorithm, provided I didn't use his code. If I stayed with the simplistic algorithm above, a whole bunch of refinements were in my to-do list. This included switching to slightly different algorithm variants (including letter frequencies, batch sizes when excluding letters, etc.) for each word length, and a bunch of changes to garner more information with each guess. Also, take more advantage of known properties of the english language; the algorithm I used took eseentially no advantage of the fact that the target was known to be an english word. I'd also have experimented with an entirely different approach. (Or perhaps several.) I had an intuition that the algorithm style I used would max out (not be farther improvable) at a relatively high number of guesses, whereas an algorithm based on letter patterns (arrangements, letter frequencies) within actual words of each length would potentially do much better ... but do much worse in its early stages, be significantly more work to develop, and like-as-not exceed the code size limit. Certainly I would have run dictionaries through various candidate algorithms looking for which worked best. ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? I live in Sunnyvale CA (i.e. Silicon Valley). I work as a software engineer. I haven't done much actual programming lately at work (I spend my time writing documents, attending meetings, etc.) and have noticed that my skills are slipping; I'm doing POTM in an attempt to get some of the rust off my programming skills, and keep it off. _________________________________________________________ Mooniac 84 ( 22/ 30/ 32) JAVA Denis Konovalchik _________________________________________________________ ================= Mooniac submitted by Denis Konovalchik at denisk@compassplus.ru ================= ++ Please summarize the algorithm used by Mooniac My algorythm is simple enough. I ordered all latin letters by frequencies of them. Then I tried to compose whole word from the current letter. If it was no "cows" in this word, it was necessary to take the next letter, and so on. When the letter appeared in the word, the algorhytm tries to determine its positions in the word by the bisectional method -- by splitting the "unguessed" word's letters set into two parts, and applying this method to each of them. This routine runs until determining of all occurencies of the current letter. ++ Did Mooniac use any letter frequency tables? Where did you get them? Yeah, 6 vectors of the single letter frequencies (for each word length, from 4 to 9 letters). These vectors were built on the dictionary placed to the net by one kind man (he also dropped the link to this site into the POTM message board). Many thanxx for him! ++ If I had it to do all over - here's what I would try .... It was necessary to upgrade this algorhytm, but I didn't have enogh time for it ;-( The main idea was to try each new letter not alone but in the company with another letters. So, receiving "3 cows" in answer to the version "ABBCCCC" we'll know that the word has letters "A" and "B" within but it doesn't contain the letter "C". ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? I live in Magnitogorsk, Russia. Our city is famous for its location (we live both in Europe and Asia; the border between these continents, Ural river splits our city into two parts), for its metallurgical plant (the biggest in the world, that's why the ecology is poor enough ;-(. The ice-hockey team "Metallurg" of our city became this year champion of Russia and Europe (it's pitiful that it will not strike this season with the champion of NHL). I'm 24 y.o. programmer, working in the software firm, menacing C++ with Java and writing code on Microsoft mustdie (c) Windows. In my vacation I do write letters into Russian computer magazines, including "Computerra" (www.computerra.ru), where a year ago was published my opus about POTM (Fred Great and Awful even placed the link to it onto his pretty site ;-)) During this time I am also a postgraduate student of Magnitogorsk Technical University. I would be happy to work as programmer in United States some time later, it's the real dream for each serious IT specialist in my country. Anyway, now I spend a funny life. Thanx very much everybody who entered this contest! I hope that we'll meet here once more. _________________________________________________________ Brave_Cow_Word 90 ( 26/ 36/ 28) c Lourenco Arnasalon _________________________________________________________ Sorry ... No questionairre was returned for Brave_Cow_Word _________________________________________________________ HitOrMiss 91 ( 17/ 29/ 45) JAVA Stuart Rowland _________________________________________________________ ================= HitOrMiss submitted by Stuart Rowland at swrowland@lucent.com ================= ++ Please summarize the algorithm used by HitOrMiss HitOrMiss used logic to try to determine the word, mostly by elimination. It could solve any 4-9 letter sequence whether it was an English word or not. It first tried to eliminate as many letters as possible by guessing words that contained the least frequently occurring letters. If the response was all misses, all the letters were eliminated. If the response showed that at least one letter was correct, but no letter was in the correct position, each letter in the guess could be eliminated in that position. The temporary file consisted of the set of possible letters for each position, the set of letters that were known to be in the word, the set of letters that were known not to be in the word, and the guess history. The history was reviewed each time. If a guess in the history had one hit and all but one of the letters were known not to be in the word, the remaining letter had to be correct. If a position was narrowed down to two possible letters, a guess was constructed using one of the choices and letters known not to be in the word. The response would either select the letter or reject it. If a letter known to be in the word occurred in only two positions, a guess was formed with the letter in one of the positions and letters known not to be in the word. ++ Did HitOrMiss use any letter frequency tables? Where did you get them? I used to grope dictionary to find the frequency of occurrence of the letters in each position over the sets of words of length 4 to 9. ++ If I had it to do all over - here's what I would try .... ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? New Jersey POTM program In the Seventh Guest adventure game there is a microbe game that would make an interesting competition. It is a territory game played on a square grid. Each player starts with a single piece in opposite corners. A move consists of either cloning or jumping. A clone move creates a new piece in one of the 8 empty spaces next to an existing piece. Jumping is moving a piece to an empty square two away in one of the 8 possible directions. At the end of a move, any of the opponents pieces that touch the moved piece are changed to the players color. The game ends when all positions are occupied. The winner is the player with the most pieces. _________________________________________________________ Spelling-B 93 ( 21/ 38/ 34) c Andrew Gauld _________________________________________________________ ================= Spelling-B submitted by Andrew Gauld at andrew.gauld@att.com ================= ++ Please summarize the algorithm used by Spelling-B Two pass method: pass 1, find how many of what letters are contained in the word by using guesses containing only single letters (EEEEE,TTTTT, etc.) once all letters are identified, go to second pass. Second pass is identify letters one by one by using a known bad letter to pad the end of the guesses (EZZZZ, TZZZZ) this proceeds until the last two letters at which point I just guess on their order. ++ Did Spelling-B use any letter frequency tables? Where did you get them? Just the old traditional ETAOIN SHRDLU as a seed for the letter search ++ If I had it to do all over - here's what I would try .... Winning the lottery so I'd have more time to spend on it. ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? _________________________________________________________ rayka 94 ( 23/ 37/ 34) c O.Jansen-Olliges A.Gerns _________________________________________________________ ========================= rayka was submitted by Olaf Jansen-Olliges and Arnd Gerns at Ojan_Band@t-online.de ========================= ++ Please summarize the algorithm used by rayka it's a really dumb one: 1. step collect the letters in the word by testing aaaa, eeee, ffff, and so on. 2. step find out the position of the letters with the help of a letter not in the word e.g. q ( eqqq, qeqq, qqeq , ...) after this step you can build the word. ++ Did rayka use any letter frequency tables? Where did you get them? yes we've builded them from a UNIX dictionary file. ++ If I had it to do all over - here's what I would try .... 1. do it better ;-) (perhaps find more than one letter per try e.g abbb instead of aaaa)2. spending more time (but unfortunately this time we had a lot to do) ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? Arnd lives in Braunschweig and I (Olaf) come from Hannover And sometimes we like to sit together in my home-office (a sun and some LINUX boxes (5)) wasting some time programming c or c++ (not for the job). rayka - by the way is the name of my daughter (15 month) one of the reasons for me to leave my home-office not too late. I'm looking forward to the next POTM-Problem !! _________________________________________________________ superfreq 94 ( 20/ 36/ 38) PERL Dan Schmidt _________________________________________________________ ================= superfreq submitted by Dan Schmidt at dfan@alum.mit.edu ================= ++ Please summarize the algorithm used by superfreq superfreq is incredibly simple; it's actually the first program I got working. I then started a totally new program, superfreq2, which only ever worked on a subset of legal words, but it does get HONESTLY in 7, faster than any of the programs on the list :) I'll describe superfreq2 from now on, since it's more interesting. It first determines what letters are in the word. It guesses words like EEEETTA, since we then know exactly what letters are in the word, given the number of misses. I then go through all possible permutations of those letters, checking for compatibility with the results of previous guesses and then scoring them. Scoring is done with the aid of a trigraph table; the table starts like this: ED$ 3073 ING 2844 NG$ 2505 where each line is the number of times that trigraph occurred in /usr/dict/words (^ and $ stand for beginning- and end-of-string). Anyway, the score is just the sum of these numbers for each three-letter sequence in the word we're trying out: trigraphs not in the table get a score of -10000. Here it is at its best: d:\src\moo> sue HONESTLY ASAEEEEE -> 1 hits, 2 misses RRSRIRIN -> 0 hits, 6 misses TLOOLEOO -> 0 hits, 0 misses UECUDCCC -> 0 hits, 7 misses PMGPOMMM -> 0 hits, 7 misses BBBBHYHS -> 0 hits, 4 misses HONESTLY -> 2625 / 8 hits, 0 misses Got it in 7 guesses. d:\src\moo> sue LATRINES ASAEEEEE -> 1 hits, 0 misses RRSRIRIN -> 2 hits, 0 misses TLOOLEOO -> 0 hits, 4 misses LATRINES -> 4426 / 8 hits, 0 misses Got it in 4 guesses. So it did great on 7+ letter words with no duplicates, but if there were too many duplicates then my permutation code was too slow, and I never even tried to handle 6- letter words (my EEEETTA trick for quickly finding all the letters only works for 7+ letter words). I had a fast multiset-permutation generator (stolen from Knuth), scoring function, and validity checker, but even so I couldn't handle more than one duplicate letter in a 9-letter word on my P166. ++ Did superfreq use any letter frequency tables? Where did you get them? It used a trigraph table that I created from /usr/dict/words on my system. The only interesting thing about it is that it uses ^ and $ as well as the 26 letters. I think that helped a lot. ++ If I had it to do all over - here's what I would try .... I would finish my better program! But I never got a good idea for what to do for short words or ones with too many duplicates. ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? I live in Cambridge, MA, and write music software. A POTM idea is forthcoming... -- Dan Schmidt -> dfan@harmonixmusic.com, dfan@alum.mit.edu Honest Bob & the http://www2.thecia.net/users/dfan/ Factory-to-Dealer Incentives -> http://www2.thecia.net/users/dfan/hbob/ Gamelan Galak Tika -> http://web.mit.edu/galak-tika/www/ _________________________________________________________ Moot_Force 97 ( 26/ 35/ 36) SH Brian Rapp _________________________________________________________ ================= Moot_Force submitted by Brian Rapp at rapp@boutell.com ================= ++ Please summarize the algorithm used by Moot_Force Moot_Force uses a method which is guaranteed to solve the problem in less than 100 guesses, but it is a very primitive method indeed. It first guesses words consisting of all one letter (e.g. AAAAA) and continues to do so until it discovers all the letters in the word. Along the way it remembers one letter that is not in the word (e.g. Q). In the next phase, it solves each position of the word one at a time by guessing a valid letter in that position and the invalid letter for all other positions (e.g. AQQQQ). When all positions are known, it suddenly guesses the correct word. Because Moot_Force follows such a simple strategy, it can almost completely ignore MISS data and store its knowledge in a very small temp file. Clearly such a brute force strategy cannot _win_ the contest, but it does represent an interesting and at least technically successful extreme case. ++ Did Moot_Force use any letter frequency tables? Where did you get them? Moot_Force does have a frequency list from which it guesses the letters in order during the first phase. Because the first positions in the word will have more possibilities to be guessed in the second phase, I biased the list somewhat towards letters near the beginning of words. Because Moot_Force isn't terribly concerned with winning, this list hasn't been optimized very much, but it might be better than just using alphabetical order. ++ If I had it to do all over - here's what I would try .... I found it very amusing to program Moot_Force in shell script, but if I were to put in the effort for a serious attempt I'd certainly need to start over from scratch. ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? Maryland. Games. Programming. Mercury. Battle-Royale of all programs. _________________________________________________________ themoops 101 ( 32/ 30/ 39) c Stephen Palmer _________________________________________________________ ================= themoops submitted by Stephen Palmer at srpalmer@att.com ================= ++ Please summarize the algorithm used by themoops Very simple algorithm. Loop through all letters (using a crude order based on most common letters first) to find out how many of each letter are in the word. Once all letters are known, search for one letter at a time to find its position. Once all positions are known, you've got the word. ++ Did themoops use any letter frequency tables? Where did you get them? A frequency table was used, and there seem to be loads of examples depending on the source. Mine came from a web site that ranked letter usage based on several popular novels. ++ If I had it to do all over - here's what I would try .... I never cared about the misses, and determined the letters and positions only from hits. I'm sure I could have decreased the number of guesses if I had tried to use all available information. ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? I live in central New Jersey and work for AT&T designing web solutions using Microsoft NT. And I love POTM problems. _________________________________________________________ pat 104 ( 24/ 36/ 44) c Bernard Hatt _________________________________________________________ ================= pat submitted by Bernard Hatt at bmh@arkady.demon.co.uk ================= ++ Please summarize the algorithm used by pat The algorithm was to change one letter at a time and learn which letters were correct or could be eliminated. From previous POTM experience I have learned that simple robust algorithms are better able to withstand Fred's choice of test data :-) ++ Did pat use any letter frequency tables? Where did you get them? Letter frequency tables were generated from the expanded GNU ispell english dictionary which stores words in a root+affix format. For each position in the word I had a table of the most likley letters for that position in order, and for each letter in each position an ordered list of the previous and next letters. ++ If I had it to do all over - here's what I would try .... Changing only one letter at a time was never going to be a very fast algorithm, (I think) a better solution would be to change several letters at a time and store the information about what was learned (eg. there is no 'Z' in the word, or there is an 'H' or a 'K' in the word, but not in position 3 etc.) to pick the best guesses from the letter probabilities and the stored info. ++ WHERE DO YOU LIVE? I'm based in Hemel Hempstead in the UK (about 25 miles NW of London) DO FOR FUN? Programming/Gardening/Cooking/Eating/Wine making/Drinking. DO FOR WORK? I work as a sysadmin for a large database company, where I am one of the very few people who knows very little about databases. INNERMOST SECRET? POTM IDEAS? How about some time of shape fitting problem like Tetris ? _________________________________________________________ simental 104 ( 26/ 40/ 38) c Marty Fouts _________________________________________________________ ================= simental submitted by Marty Fouts at potm@fogey.com ================= ++ Please summarize the algorithm used by simental Simental is a 'stupid guesser' algorithm that did about as poorly as I expected it would. It works in two passes: * Pass 1 -- try each letter of the alphabet in decreasing order to find out what letters are in the target word. * Pass 2 -- try each letter of the word in each possible position in the word, in left to right order, skipping those positions that have already been found. The only 'cleverness' at all in simental is that the letters of the alphabet were tried according to a frequency order, rather than alphabetically. ++ Did simental use any letter frequency tables? Where did you get them? Yes. I invented the frequency table by analysing several online word lists. ++ If I had it to do all over - here's what I would try .... Almost anything. Unfortunately, schedule intervened and I never got around to improving the algorithm that I sent in as a place holder. ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? California. Play classical guitar. That the *real* answer to the universe is 17, not 42. No, I'm not clever at making puzzles. _________________________________________________________ Fools_Gold 111 ( 32/ 35/ 44) c Jeffrey Jackson _________________________________________________________ ================= Fools_Gold submitted by Jeffrey Jackson at jjackson@harmonic.com ================= ++ Please summarize the algorithm used by Fools_Gold step 1. using my letter frequency array, make 'solid' guesses (e.g. SSSSSS or HHHHHH for a 6 letter word) until i've found all the letters in the word. step 2. for each letter in the unknown word (left ot right) go through all the letters we know are hits and check if it hits at that particular spot. i used a letter known NOT to hit in the other spaces so i could check just the one letter at a time. ++ Did Fools_Gold use any letter frequency tables? Where did you get them? i did use one. it was just a simple array ordering the 26 letters. i pulled several thousand lines of plain text off the web and wrote a little program that counted each letter. i then ordered my array from most common occuring letter to least. ++ If I had it to do all over - here's what I would try .... i would come up with another dimension to my frequency table. this other dimension would carry frequencies for the most common letters following certain other letters. for example, the letter 'V' is almost always followed by a vowel. so if i found a 'V' anywhere, i could check the vowels after that instead of sticking to my one dimensional frequency table. ++ WHERE DO YOU LIVE? DO FOR FUN? DO FOR WORK? INNERMOST SECRET? POTM IDEAS? i live in minneapolis and write software for a living. a real shocker i'm sure. haha. fun? is there time for that? :-) hey, if i told you my innermost secret, it wouldn't be secret anymore. :-) and it might even end up out on the internet somewhere! haha. yikes. potm ideas....hmmmm. although i didn't have time to code and submit an entry, i really liked the one a few contests ago called 'scenic tour'. more problems similar to that would be fun. _________________________________________________________ naive 111 ( 30/ 37/ 44) gc Michael Chen _________________________________________________________ Sorry ... No questionairre was returned for naive _________________________________________________________ MereMOO 1021 (999/ 11/ 11) c Pavel Zaletskiy _________________________________________________________ ================= MereMOO submitted by Pavel Zaletskiy at pavelz@dpt.ustu.ru ================= ++ Please summarize the algorithm used by MereMOO My program works at two steps: 1) determining letters presenting in the word; 2) determining the word, consisting of that letters. For the first step the following algorithm was used. If amount of letters in the word is more than or equal to 7 then each guessing word consists of 3 different letters. The first letter meets there 1 time, the second - 2 times and the third - 4 times. Other positions fills with letters, which are already known about that they present in the word. Having known a result of the guess program can determine whether each of three letters is in or out of the word. If word length is smaller than 7 than letters are divided into groups. And for each group program makes several guesses. In each guess information about one group and one letter of another group can be asked. If word length >=5 than in each guess program asks info about one more letter which placed in the word in several copies to determine it presence undoubtedly. At second step program generates all possible words. For each word calculates some criteria and than choose the word with best criteria. This criteria estimates as probability of that this word is an English one and words also amount of words which will exclude in sorting out of the next guess. ++ Did MereMOO use any letter frequency tables? Where did you get them? Yes, it uses table for pairs and also it knows about mostly used triples and chains of four letters. ++ If I had it to do all over - here's what I would try .... I only can suggest to try to add some heuristics for short words (n