The answer is not right. Here is the problem:
>calculate the number of different ways you could form the 3-letter
>combination in question with the letters in the Scrabble set - in the case
>of EIA, it would be 12 x 9 x 9. Then,
>you'd multiply that number by the number of ways that you could put
>together the other four tiles, which is "97 choose 4"
Problem: the other 97 tiles include instances of E, I and A, so you count
most instances twice.
The correct answer is the sum of certain coefficients of terms in a
multivariate polynomial expansion. Specifically,
Expand
(1 + A)**9 x (1 + E)**12 x (1 + I)**9 x (1 + X)**(100 - 12 - 9 - 9)
Then add up the coefficients of any term having a factor of AEI and having
total degree 7. (Do you see why this is so?)
I don't have a routine that computes exactly this function. But I know what
you are driving towards (a one-ply static lookahead that uses complete move
generation instead of simulation). (BTW: Maven contains such a method, but it
is only useful in the pre-endgame.) So I can see why you need it.
I will contribute some related code, which you can modify to your purposes.
The following routine generates all racks that can be made out of a given
group of unseen tiles.
// Worker routine. See external interface for explanation of arguments.
static void enumerate_them(
const short *UnseenCounts, char *q,
char *DistinctTiles, int k, void (*f)(const char *, int, void *),
void *Data) {
if (k > 7) return;
if (k == 7) {
// Make a string and back up to the beginning.
*q = 0;
q -= 7;
// Figure out the weight of this case.
int weight = 1;
for (char *p = q, *r = p - 1; *p; p++)
if (*p != *(p+1)) {
weight *= PascalsTriangle[UnseenCounts[*p]][p - r];
r = p;
}
(*f)(q, weight, Data);
return;
}
if (*DistinctTiles == 0) return;
// Put down each possible number of this tile, and enumerate.
int i = 0;
do {
enumerate_them(UnseenCounts, q, DistinctTiles+1, k, f, Data);
*q++ = *DistinctTiles;
k++;
} while (i++ < UnseenCounts[*DistinctTiles]);
}
// UnseenCounts is a count of each type of tile. PossibleTiles should be the
// aphabet plus blank, but Maven is language-independent so it needs this arg.
// (*f)() is a function that will receive each rack generated, and Data is a
block
// of arbitrary data that is passed into (*f)().
//
void EnumerateRacks(const short *UnseenCounts, const char *PossibleTiles,
void (*f)(const char *, int, void *), void *Data) {
// Enumerate the distinct tiles in unseen_array.
char DistinctTiles[128], *q = DistinctTiles;
for (const char *p = PossibleTiles; *p; p++) {
if (UnseenCounts[*p]) {
*q++ = *p;
}
}
*q = 0;
char word_buffer[32];
enumerate_them(UnseenCounts, word_buffer, DistinctTiles, 0, f, Data);
}
One trick you can use: Set the UnseenCounts for AEI to their frequency in the
bag, and the count for a fictitious tile (e.g. '*') to the count of all other
tiles combined. Another: set the worker routine to return if it passes 'A'
without placing at least as many 'A' tiles as you need. Another: Don't call
(*f), but instead return the accumulated weight. That should do it.
Disclaimer: this code is contributed "as is" and I don't support it.
Brian
-------------------------------------------------------------------------
>Brian seemed to be saying just count up the racks it generates which
>meet your criteria and divide by the total number of racks - but if that is
>indeed what he was saying (it wasn't entirely obvious to me) then it's not a
>feasible method because there are way too many racks to enumerate.
>(at most, 100!/93!.7!)
It is what I was saying, it is feasible, and the code that does so is
routinely deployed in Maven, computing the correct answers, and I have
sent it to you.
But combinatorial counting is a complicated problem, and if you aren't
familiar with certain techniques then it is truly head-spinning. I didn't
explain my answer in sufficient detail, so I will give more.
This e-mail will indicate two approaches to the calculation. Both will lead
to the same answer. The approach taken by Maven is the second.
The first solution applies the "inclusion / exclusion principle" of
combinatorics, as suggested by Alan Frank. It says that if you want to
know the size of a set that satisfies either of two properties (A or B) then
Total = Number satisfying A + number satisfying B - number satisfying both A
and B
The point is that in the initial sum of A + B we have double-counted the
members that satisfy both A and B. By subtracting them off we get an
exact count.
For example, let's take the AEI problem. In this case there are three
properties (contains A, contains E, contains I) and we want to satisfy all
simultaneously.
The inclusion/exclusion principle applies to unions of properties, whereas
here we have an intersection. No problem: we will look at the complementary
problem of counting the number of racks that either don't have an A, or
don't have an E, or don't have an I, which we can count using
inclusion/exclusion.
We can then subtract the total from the total number of racks.
A three-property version of inclusion/exclusion is
A or B or C = A + B + C - (A and B) - (A and C) - (B and C) + (A and B and C)
Suppose 10 E, 9 A, 8 I, in a bag of 40 tiles. The exact answer is
Answer = total racks - racks containing no E - racks containing no A -
racks containing no I + racks containing none of EA +
racks containing none of EI + racks containing none of AI -
racks containing none of AEI
Answer = C(40,7) - C(40-10, 7) - C(40-9, 7) -
C(40-8,7) + C(40-10-9,7) + C(40-10-8,7) +
C(40-9-8,7) - C(40-10-9-8,7)
= 18643560 - 2035800 - 2629575 -
3365856 + 116280 + 170544 +
245157 - 1716
= 11142594
If I have done my arithmetic correctly then this answer is sufficient to prove
that there is no simple product of C(N,K) numbers that yields a correct
answer.
For 11142594 factors into 2 * 3 * 3 * 619033, where the last factor has no
prime factor less than 100. It follows that it is impossible to make this number
using only multiplications and divisions of C(N, K) numbers where N, K <= 100.
I hope that you are satisfied that no simple formula exist, because you won't
find one and your effort is better spent on understanding the mechanics of
the correct solutions.
You can start with the principle of inclusion and exclusion, applied
recursively.
The example of AEI above shows how it can be done, but there are
complications when duplicate tiles are present. For example, suppose the
tiles were AEE instead of AEI. What then?
The answer must regard the EE combination as a unit, since the property
of containing a second E is dependent on the property of containing a first E.
By the inclusion/exclusion principle:
answer = total racks - racks not containing EE - racks not containing A +
racks not containing AEE
But now counting the non-EE racks is not so easy, because there are two cases:
racks not containing E = racks not containing E + racks containing one E
And there are even more cases for not containing AEE.
You can extend this into a correct algorithm. I will spare you the detail,
because there is a different way of going about the counting. The answer is the
solution I outlined in my code, which expresses the answer as a form of polynomial
multiplication.
Suppose that you have a polynomial in the variables A, E, I and X formed
by multiplying out the following terms
(1 + A)**9 x (1 + E)**10 x (1 + I)**8 x (1 + X)**(40-9-10-8)
(Notation: (1 + A)**9 means to multiply 1+A 9 times. (I.e. exponentiation))
In this polynomial are terms ranging in degree from 0 through 40, each of
the form
Coefficient x A**J x E**K x I**L x X**M
Here is my claim: I claim that if you draw "racks" of J + K + L + M tiles out
of this bag, then the number of racks that contain exactly J A's and exactly
K E's and exactly L I's is the Coefficient that multiplies that term in the
polynomial I just described.
Do you believe this? You can prove it by induction. It is obviously true
in boundary conditions (when the bag is empty, or when there is exactly
one tile that you care about). You can then suppose that it is true, and
add one more tile to the bag. This would correspond to multiplying by another
factor of, say, 1+A. The effect of the multiplication is to add together the
term that already satisfies the problem (i.e., multiplying by 1) and the term
that would solve the problem if you had one more factor of A (i.e., multiplying
by A).
The question is how to compute this product efficiently in the case where you
are interested in the terms of degree exactly 7 and there are a number of
coefficients that you care about (i.e., all of those with at least a factor
of AEI).
The code I gave you doesn't do that calculation directly, but it can be
adapted to do so. My code generates all distinct racks of 7 tiles together
with their relative weight. To adapt this routine to efficiently count AEI
racks you have to
1) Stop enumerating as soon as you prove that adding to the current rack
will not contain AEI. For instance, if you are adding B's to the rack you can
check whether you have added at least one A, and if not then return.
2) Regard all non-AEI tiles to be identical. For instance, count them all as
blanks.
These changes alone will make the routine produce the exact calculation.
I promise. Just follow the construction laid out by the routine I posted.
Warm Regards,
Brian
--------------------------------------------------------------------------
>Do folks see what I'm getting at here? I have information overload! -
>lots of results, but I'm not sure what to do with them.
Confronting the overload is not too bad. Given 16 tiles in the bag there
are at most 65536 different combinations, and at most 11440 combinations
of 7 tiles.
As you generate all of the moves that can be played using the unseen tiles
you will maintain an array of the best available move for each possible rack
that the opponent could have started with.
Zeroth step: create a standard order of the tiles. Represent a set of tiles by
16-bit integers where the 1's correspond to tiles in the set.
First step: Initialize the array by computing the best exchanging play for
each
possible opponent rack.
Second step: Given a move, compute a "heuristic value" by adding the score
together with a value that represents the quality of the tiles played.
(Problem:
the value of playing an E is -4 if it is your only E, but +3.5 if it is your
second E.
Ignore for now, but the model can be extended to cover this.)
Third step: compute the set of tiles played by a move. (I.e., the 16-bit
mask.)
Fourth step: Update an array of moves that is indexed by sets of tiles, as
follows:
void UpdateArray(TileSet s) {
// Check that we haven't processed s before:
if (Mark[s] == CurrentMark) return;
Mark[s] = CurrentMark;
// Make sure s is relevant:
if (s has 8 or more tiles) return;
if (current move is better than Array[s]) {
Array[s] = current move;
for (all tiles not in s) {
UpdateArray(s + tile)
}
}
}
When you are done you have an array that is indexed by a set of tiles, whose
value is the move that should be played if the opponent holds that set of
tiles.
(Provided that the opponent evaluates moves according to the heuristic value.)
You can then average the opponent's heuristic score in that position.
Brian
=========================================================================
From: z@catlover.com
Date: Tue, 27 Mar 2001 04:39:19 -0000
Subject: [wgp] Re: Need a second opinion... probability of drawing tiles
As we have seen the formula does not work because you are counting
extra combinations of tiles.
In fact the formula only works under the folowing conditions:
1. If the tiles you are trying to select from the bag are every
instance of those tiles in the bag, or
2. If the tiles you are selecting fill the entire rack.
Or you can put it conversely:
The formula won't work if:
There is a possibility of any unknown (or wildcard) tiles in the rack
that can be the same as the tiles you are choosing.
This program in VBA (within Excel) will give the actual probability
of each selection from each bag. I include it below. It is not
intended as a solution to your problem, just as something to check
any future outputs you get with an updated formula.
It works by doing trial draws from the bag and accumulating the
probabilites along the way. It does work correctly (barring any
coding errors) for all rack/bag combinations. It is very slow,
especially for a full rack of seven tiles selected from a full bag (a
situation where your formula gives the correct answer anyway). Try
it with a racksize of more than about 20, it will give the right
answer but it will also take almost forever to calculate! The run
time is proportional to Permuations(racksize, selection size).
To use it as is, open Excel and insert a new module. Cut and paste in
the code and run "test". If you don't have Excel then the VB code
should be quite easy to convert to something else. BTW I used
the "back quote" (`) for the blank since it comes one before "a" in
the ASCII character set.
----- code start -----
Dim PutCount As Long
Dim WantedCount As Long
Dim BagCount As Long
Dim RackCounts(26) As Long
Dim BagCounts(26) As Long
Dim BagThreshold As Long
Dim NonMatchCount As Long
Dim ResultRow As Integer
Sub test()
Dim FullRack As String, HalfRack As String
ResultRow = 1
FullRack = _
"aaaaaaaaabbccddddeeeeeeeeeeeeffggghhiiiiiiiiijkllllmmnnnnnn" + _
"ooooooooppqrrrrrrssssttttttuuuuvvwwxyyz``"
HalfRack = "aaaaabcddeeeeeefghiiiiikllmnnnooooprrrsstttuuvwxy`"
Call DrawProb("a", FullRack, 7, "Full rack")
Call DrawProb("e", FullRack, 7, "Full rack")
Call DrawProb("z", FullRack, 7, "Full rack")
Call DrawProb("aa", FullRack, 7, "Full rack")
Call DrawProb("az", FullRack, 7, "Full rack")
Call DrawProb("eia", FullRack, 7, "Full rack")
Call DrawProb("eia", FullRack, 6, "Full rack")
Call DrawProb("eia", Mid(FullRack, 2), 7, "Full rack - 'a'")
Call DrawProb("otarine", FullRack, 7, "Full rack")
Call DrawProb("eia", HalfRack, 7, "Half rack")
Call DrawProb("abcdefg", "abcdefgh", 7)
Call DrawProb("a", "abcdefgh", 7)
Call DrawProb("a", "aabcdefg", 7)
Call DrawProb("a", "aabcdefgh", 7)
Call DrawProb("a", "aabcdefghi", 7)
Call DrawProb("aa", "aabcdefg", 7)
Call DrawProb("ab", "aabcdefg", 7)
Call DrawProb("abc", "aabcdefg", 7)
Call DrawProb("abcd", "aabcdefg", 7)
Call DrawProb("abcde", "aabcdefg", 7)
Call DrawProb("abcdef", "aabcdefg", 7)
Call DrawProb("abcdefg", "aabcdefg", 7)
Call DrawProb("b", "aabcdefg", 7)
Call DrawProb("bc", "aabcdefg", 7)
Call DrawProb("bcd", "aabcdefg", 7)
Call DrawProb("bcde", "aabcdefg", 7)
Call DrawProb("bcdef", "aabcdefg", 7)
Call DrawProb("bcdefg", "aabcdefg", 7)
Call DrawProb("aabcdefg", "aabcdefg", 8)
Call DrawProb("abcdefg", "aabcdef", 7)
Call DrawProb("abcdefg", "abcdef", 7)
Call DrawProb("a", "abcdefg", 8)
Call DrawProb("i", "abcdefgh", 7)
Call DrawProb("muzjiks", FullRack, 7, "Full rack")
Call DrawProb("zy``yva", FullRack, 7, "Full rack")
Call DrawProb("sting", FullRack, 7, "Full rack")
Call DrawProb("retina", FullRack, 7, "Full rack")
End Sub
Sub DrawProb(Tiles As String, Bag As String, RackSize As Integer, _
Optional Text As String = "")
Dim p As Double
p = RackProb(Tiles, Bag, RackSize)
Cells(ResultRow, 1) = Tiles
If Text = "" Then
Cells(ResultRow, 2) = Bag
Else
Cells(ResultRow, 2) = Text
End If
Cells(ResultRow, 3) = RackSize
Cells(ResultRow, 4) = p
If p > 0 Then
Cells(ResultRow, 5) = 1 / p
Else
Cells(ResultRow, 5) = "Impossible"
End If
ResultRow = ResultRow + 1
End Sub
Function RackProb(Tiles As String, Bag As String, _
RackSize As Integer) As Double
If RackSize > Len(Bag) Then Exit Function
If Len(Tiles) > Len(Bag) Then Exit Function
If Len(Tiles) > RackSize Then Exit Function
Call GetCounts(LCase(Bag), LCase(Tiles))
PutCount = 0
If ((WantedCount * 2) > BagCount) And _
(WantedCount = RackSize) Then
'this is a feeble attempt at some speed up.
'if the rack is completely filled by the tiles and it is more
'than half the bag then we reverse everything and count the
'letters that are left on the bag instead of the letters
'on the rack. The difference in speed is only minor though!
For n = 0 To 26
RackCounts(n) = BagCounts(n) - RackCounts(n)
If RackCounts(n) < 0 Then Exit Function
'if there is no chance the return prob is zero by default
Next
WantedCount = BagCount - WantedCount
BagThreshold = RackSize + 1
RackProb = Prob()
For n = RackSize + 1 To BagCount
RackProb = RackProb / n
Next
Else
BagThreshold = BagCount - RackSize + 1
RackProb = Prob()
For n = BagCount - RackSize + 1 To BagCount
RackProb = RackProb / n
Next
End If
End Function
Function Prob() As Double
Dim CumulativeProb As Double
CumulativeProb = 0
If BagCount >= BagThreshold Then
If PutCount < WantedCount Then
For n = 0 To 26
If RackCounts(n) > 0 Then
RackCounts(n) = RackCounts(n) - 1
BagCounts(n) = BagCounts(n) - 1
NonMatchCount = NonMatchCount - 1
If RackCounts(n) = 0 Then NonMatchCount = _
NonMatchCount - BagCounts(n)
PutCount = PutCount + 1
BagCount = BagCount - 1
CumulativeProb = CumulativeProb + _
(BagCounts(n) + 1) * Prob()
BagCount = BagCount + 1
PutCount = PutCount - 1
If RackCounts(n) = 0 Then NonMatchCount = _
NonMatchCount + BagCounts(n)
NonMatchCount = NonMatchCount + 1
BagCounts(n) = BagCounts(n) + 1
RackCounts(n) = RackCounts(n) + 1
End If
Next
BagCount = BagCount - 1
CumulativeProb = CumulativeProb + (BagCount + 1 - _
NonMatchCount) * Prob()
BagCount = BagCount + 1
Else
If NonMatchCount <> 0 Then Call MsgBox("Oops", _
vbOKOnly, "Oops, something gone wrong!")
BagCount = BagCount - 1
CumulativeProb = CumulativeProb + (BagCount + 1) * Prob()
BagCount = BagCount + 1
End If
Else
If PutCount = WantedCount Then CumulativeProb = 1
End If
Prob = CumulativeProb
End Function
Sub GetCounts(Bag As String, Tiles As String)
Dim x As Integer
BagCount = Len(Bag)
WantedCount = Len(Tiles)
NonMatchCount = 0
For n = 0 To 26
BagCounts(n) = 0: RackCounts(n) = 0
Next
For n = 1 To Len(Bag)
x = Asc(Mid(Bag, n, 1)) - Asc("`")
BagCounts(x) = BagCounts(x) + 1
Next
For n = 1 To Len(Tiles)
x = Asc(Mid(Tiles, n, 1)) - Asc("`")
RackCounts(x) = RackCounts(x) + 1
If RackCounts(x) = 1 Then
NonMatchCount = NonMatchCount + BagCounts(x)
End If
Next
End Sub
---- code end ----
And the output from the above is:
a Full rack 7 0.494364551 2.02
e Full rack 7 0.603416322 1.66
z Full rack 7 0.070000000 14.29
aa Full rack 7 0.119599454 8.36
az Full rack 7 0.031104965 32.15
eia Full rack 7 0.116928034 8.55
eia Full rack 6 0.077148089 12.96
eia Full rack - 'a' 7 0.108950382 9.18
otarine Full rack 7 0.000104926 9530.49
eia Half rack 7 0.148475738 6.74
abcdefg abcdefgh 7 0.125000000 8.00
a abcdefgh 7 0.875000000 1.14
a aabcdefg 7 1.000000000 1.00
a aabcdefgh 7 0.972222222 1.03
a aabcdefghi 7 0.933333333 1.07
aa aabcdefg 7 0.750000000 1.33
ab aabcdefg 7 0.875000000 1.14
abc aabcdefg 7 0.750000000 1.33
abcd aabcdefg 7 0.625000000 1.60
abcde aabcdefg 7 0.500000000 2.00
abcdef aabcdefg 7 0.375000000 2.67
abcdefg aabcdefg 7 0.250000000 4.00
b aabcdefg 7 0.875000000 1.14
bc aabcdefg 7 0.750000000 1.33
bcd aabcdefg 7 0.625000000 1.60
bcde aabcdefg 7 0.500000000 2.00
bcdef aabcdefg 7 0.375000000 2.67
bcdefg aabcdefg 7 0.250000000 4.00
aabcdefg aabcdefg 8 1.000000000 1.00
abcdefg aabcdef 7 0.000000000 Impossible
abcdefg abcdef 7 0.000000000 Impossible
a abcdefg 8 0.000000000 Impossible
i abcdefgh 7 0.000000000 Impossible
muzjiks Full rack 7 0.000000018 55581808.33
zy``yva Full rack 7 0.000000001 889308933.33
sting Full rack 7 0.000838339 1192.84
retina Full rack 7 0.000957454 1044.44
--- In wordgame-programmers@y..., Graham Toal wrote:
> The function below purports to give an answer to questions such as
> If I draw 7 tiles from a bag of 50, what is the probability of
> me drawing tiles which include "ing"?
>
> If you are familiar with probability maths and C, and have some
free time
> anytime over the next week, and are willing to try this procedure
out and
> tell me if it generates any wrong answers, and why, I would be very
greatful.
>
> This one procedure is at the heart of the project I'm working on
and I have
> to be sure it is right. At the moment I'm not at all confident.
>
> Thanks
>
> Graham
>
> /*
>
> QUESTION: The first question I have is very basic. Let's say we've
> made a play that puts down 3 specific letters. What is the
probability
> of having picked a rack which contained those letters? Is is just
> Combs(3, 100), or is it something like Combs(3, Combs(7, 100))? I
suspect
> it makes a difference if you say that the rest of the rack is
different
> letters, as opposed to you don't care what the rest of the rack is -
> i.e. you could allow duplicated letters. I think for this
application
> I don't care what the other letters are.
>
> ANSWER: Well, it makes a heck of a lot of difference what the three
letters
> are. The chances of having a rack with the three letters EIA is a
whole lot
> greater than having JZX. What you'd do to figure this out is to
first
> calculate the number of racks that would meet the criteria. First,
you'd
> calculate the number of different ways you could form the 3-letter
> combination in question with the letters in the Scrabble set - in
the case
> of EIA, it would be 12 x 9 x 9. (In the case of JZX, it would be
1.) Then,
> you'd multiply that number by the number of ways that you could put
> together the other four tiles, which is "97 choose 4" (i.e., the
number of
> ways you could choose 4 items out of 97 items), which is 97!/(93! x
4!).
> Therefore, the numerator of your probability, in the case of
EIA???? would
> be (12 x 9 x 9 x 97!)/(93! x 4!). And the denominator of the
probability,
> as always, is "100 choose 7", or 100!/(93! x 7!) - the number of
ways you
> could choose an opening rack from a full bag.
>
> Expanding the above and cancelling where appropriate gives:
>
> 12.9.9.97!
> ----------
> 93!.4!
>
> -------------- => 12.9.9.97! * 93!.7! 12.9.9
* 7!
> ------------------- => ---------
-----
> 100! 93!.4! * 100! 4! *
100.99.98
> ------
> 93!.7!
>
>
> Let's do this again for a single 'z':
>
> 1.99! * 93!.7!
1.7! 7
> --------------- => ------
=> ---
> 93!.6! * 100!
6!.100 100
>
>
> So the probability of chosing a Z with a draw of 7 tiles is 7/100.
>
> */
>
>
> float drawprob(char *tile, char *bag, int drawsize)
> {
> int tot[256];
> /* I had to do these as floats because longs were overflowing */
> float numerator = 1.0;
> float denominator = 1.0;
> float max = (float)strlen(bag);
> int maxtiles = strlen(tile);
> int i;
>
> for (i = 0; i < 256; i++) tot[i] = 0;
> while (*bag != '\0') {
> tot[*bag] += 1;
> bag += 1;
> }
>
> for (i = 0; i < maxtiles; i++) {
> numerator *= (float)tot[tile[i]]; /* eg 12.9.9, or 1 */
> numerator *= (float)(drawsize-i); /* add the factorial in the
numerator (7!) */
> denominator *= max; /* eg 100.99.98 or 100 */
> max -= 1.0;
> }
>
> return(numerator/denominator);
> }
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
From: z@catlover.com
Actually the number of possible racks is quite manageable.
In fact just today I have just been running such a program to draw
every possibility from the bag for each rack size. The output looks
like this:
01-27
02-373
03-3509
04-25254
05-148150
06-737311
07-3199724
08-12353822
09-43088473
10-137412392
11-404600079
12-1108793943
13-2847262062
14-6890404765
15-15792242064
16-34425824044
17-71646518736
The first number is the size of the rack and the second number is the
total number of possible unique racks from a standard english
Scrabble bag of 100 tiles.
I could not get my head around calculating these numbers so brute
force saved the day this time.
So for 7 tiles in the rack there are only 3199724 possible
combinations of different racks. Indeed you can enumerate every
rack, calculate its probability, sort the entire set and printout
the list sorted from most to least probable all within 1 minute on a
modern computer. I did it and now have a 53Meg file which has not
only to top 300 racks but the top 3 million racks. I shan't include
the code here since I did it in assembly this time (not VBA, it is
way too slow).
So go for it and don't be shy about the numbers, they are manageable.
Also, I see that Brian has approached the probabilites from an
exclusion point of view. I approached from the side of inclusion.
Perhaps Brian is a negative person and I am a positive person :).
Haha, just kidding Brian, please don't take any offense. Indeed you
can use a mixture of the two methods in a final optimised solution.
Use inclusion where the minimum wanted tiles is more than half the
rack size and use exclusion where there is less than half. Both
halves should then have a similar degree of complexity and run time.
Pascals triangle is in fact all the answers to the combo(x,y)
function. It is a way to store the precomputed probabilities. When
you are generating the table it is more efficient to add up the
preceeding two terms rather than multiply out the whole combo(x,y)
thingy.
--- In wordgame-programmers@y..., Graham Toal wrote:
> > >Is *this* program also returning wrong numbers???
> >
> > That program is right, but you cannot generalize the counting
> > technique to racks that are not full.
>
> Right, I understood that from your earlier explanation, and agree.
>
> What I don't understand is this: that program can - if you give it a
> full bag as input - generate the entire subset of all racks from
those
> 100 tiles. However it runs out of steam with bag sizes way down
> in the 40's. And I would expect that, because C(7,100) is a huge
> number. What is it about your code that allows you to work out the
> probabilities of draws from a rack of 100 which you do by
enumerating
> all C(7, 100) possibilities???
>
> I assume that in fact you *don't* actually enumerate all of them,
> by using various tricks to trim the search, but if you do prune vast
> quantities of racks without generating them, then how can you count
them?
>
> Is this something to do with the PascalsTriangle[][] array? Does
it have
> precomputed totals for subset problems? (I know what Pascals
Triangle is,
> but I haven't worked out its relation to this problem)
>
> Graham
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
HUH???? There are only 3 million racks possible from drawing 7 tiles
out of 100???!!!
Damn, no wonder Brian disagreed so strongly with me. I thought the
figure was some orders of magnitude greater.
Lets double check:
C(7,100) = 100! 100.99.98.97.96.95.94 80678106432000
------------- = --------------------- = --------------
(100-7)! * 7! 7.6.5.4.3.2.1 5040
= 1600616355.555'
OK, where did I go wrong?
Wait a minute... "Unique racks"...? Is that the clue? Yes, my generator
will generate two apparently similar racks which differ in the actual tiles
chosen. But in order to know the denominator of what I'll call the
"probability equation" (that this is all about) don't you need to count
those duplicated racks too?
-----------------------------------------------------------------------
To: Word Game Programmers
From: Alan Frank
Date: Wed, 4 Apr 2001 00:15:33 -0400 (EDT)
Reply-To: Andrew Latto
Subject: [wgp] Counting racks
I'm forwarding the following on behalf of Andy Latto, who is not a wg-p
subscriber...
Date: Sat, 24 Mar 2001 16:45:46 -0500 (EST)
From: Alan Frank
The first thing that struck me about your description of the answer
(excerpted below) is that you can't simply separate the desired letters
from the remaining ones; you might end up counting some racks more than
once; e.g., EIA+TOAL would be counted twice, considering each A once to be
part of EIA and once part of TOAL. Also, the number of EIA's is not
12x9x9, but 12x9x9/6. In general, you need to divide by the factorial of
the number of letters.
One way to do the computation would be to count the racks that don't
contain EIA and subtract from the total. I'm not sure which ends up being
easier. I'll think about it.
The easiest way I know to do this computation is via a technique
called inclusion-exclusion. I'll illustrate with an example.
> The function below purports to give an answer to questions such as
> If I draw 7 tiles from a bag of 50, what is the probability of
> me drawing tiles which include "ing"?
Let's say that the bag includes 4 I's, 3 N's, and 2 G's.
Notation: R(N) = N!/7!(N-7!). This is the number of different racks
that can be produced from a set of N tiles.
There are R(50) total racks. To determine how many contain "ING", we
subtract off the R(46) racks that contain no I's, the R(47) that contain
no N's, and the R(48) that contain no G's, giving
R(50) - R(46) - R(47) - R(48)
But this is too small; the racks containing neither an I nor an N have
been subtracted off twice. So we need to add these back in. Fortunately,
they are easy to count; there are 43 tiles that aren't I's or N's, so there
are R(43) of these. Similarly, we have double-subtracted the racks
that have neither N nor G, and the ones with neither I nor G. Adding these
back in, we get:
R(50) - R(46) - R(47) - R(48) + R(43) + R(45) + R(44)
But even this isn't quite right. Look at the racks containing none
of I, N, and G. These were counted originally, then subtracted off 3 times,
then added back in 3 times. So we need to subtract these racks
off once more. The result is
R(50) - R(46) - R(47) - R(48) + R(43) + R(45) + R(44) - R(41)
So the probability we are looking for is
(R(50) - R(46) - R(47) - R(48) + R(43) + R(45) + R(44) - R(41))/R(50)
This same technique works for any number of letters; you start with
combinations of all the letters, subtract off the number missing any 1
letter, add back in the ones missing any pair, subtract off the ones
missing any triple, add back in the ones missing any quadruple, and
so forth. This ends up counting each combination exactly once*
Now if you want to count racks containing ABCC, that's a bit trickier.
Now instead of subtracting off racks containing no C, you have to
subtract off racks containing at most one C. This is the sum of two
terms; racks containing no C's, which is easy, and racks containing
exactly 1 C, which is the number of C's, multiplied by the number
of 6-tile racks from the remaining non-C tiles. Ask me if you want
more details.
Andy.Latto@Pobox.com
*Cryptic footnote for math geeks: This works because the alternating sum of
the binomial coefficients is 0, because it's the expansion of
(1-1)^N).
- --EoCW9aWN80uvJB-x8Ksrf3vaWrGFEESwRe09wD1--
------- End of forwarded message -------
------------------------ Yahoo! Groups Sponsor ---------------------~-~>
Make good on the promise you made at graduation to keep
in touch. Classmates.com has over 14 million registered
high school alumni--chances are you'll find your friends!
http://us.click.yahoo.com/03IJGA/DMUCAA/4ihDAA/kmPZlB/TM
---------------------------------------------------------------------_->
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/