Remember
this one?
Well, this time, the question is:
Assuming that birthdays are evenly distributed around 365 days of the year...
what is the minimum number of people I must have in a room, such that the odds are that at least n people share the same birthday?
Let's limit this question to n values from 1 to 12.
We know that for n=1, 1 person is sufficient.
For n=2, as is described in Happy Birthday, 23 people are sufficient.
What are the minimum numbers for n=3 to 12?
The situation was simulated by
DEFDBL A-Z
DIM noDays(12, 600)
trials = 200000
FOR trial = 1 TO trials
REDIM bday(365)
maxRep = 0
FOR people = 1 TO 600
bd = INT(RND(1) * 365 + 1)
bday(bd) = bday(bd) + 1
IF bday(bd) > maxRep THEN
maxRep = bday(bd)
IF maxRep < 13 THEN
noDays(maxRep, people) = noDays(maxRep, people) + 1
END IF
END IF
NEXT
NEXT
FOR rep = 2 TO 12
ct = 0
FOR i = 1 TO 600
ct = ct + noDays(rep, i)
IF ct >= trials / 2 THEN PRINT rep, i, ct / trials: EXIT FOR
NEXT
NEXT
for 200,000 trials, producing
2 23 .50595
3 88 .51103
4 186 .50154
5 313 .504695
6 460 .50365
so apparently, likelihood of 7 or more repetitions of a birthday requires more than the 600 allowed for.
But with this information we have enough to check out answers that we get from more theoretic analysis, and then use that theoretic analysis to determine what size array to allow for.
The probability that one given birthday will appear exactly r times among n people is p(n,r) = C(n,r)*(1/365)^r*(364/365)^(n-r).
Then the probability that it appears at least r times among n people is P(n,r) = 1 - p(n,0) - p(n,1) - ... - p(n,r-1). (That's a capital P defined in terms of small p, in case the fonts don't make that clear.)
The difficulty is then finding the probability that any birthday occurs at least r times among n people. If the events were mutually exclusive, we could add the probabilities (that is just multiply the probability for a given birthday by 365). But they are not mutually exclusive--more than one date might occur r or more times. To allow for this, we could use inclusion/exclusion to subtract out the pairwise ANDed conditions (of two birthdays both being represented), as each had been counted though only one represented a new "success". Defining now P(m,n,r) as the event of m different dates each having at least r occurrences among n people, with P(1,n,r) being just the previously defined P(n,r), the overall sought probability is C(365,1)*P(1,n,r) - C(365,2)*P(2,n,r) + C(365,3)*P(3,n,r) - C(365,4)*P(4,n,r) + ... , where the first term is the oversimplified (exclusivity-assumed) 365*P(n,r).
The problem is that these pairwise (and higher multiple-wise) probabilities are hard to calculate. However, we can make an approximation by assuming that they are independent (though they are also not actually independent). For example, the probability that January 1 is present among a group of 50 people is 1-(364/365)^50 = 0.1281817426688633092, as is the probability that January 2 is present. The probability that both are present is 0.0161437857620377809. If we assumed their presences were independent, we'd have multiplied the individual probabilities (i.e., squared the probability of one), and used 0.0164305591536266927, which is not very different, but is indeed different. The exact calculation was possible for this any-occurrence case, but becomes more difficult for multiple-occurrence cases.
The following program calculates the probabilities as an approximation by treating the compound conditions as independent and carrying the inclusion/exclusion to the possibility of 5 different dates meeting the criteria:
5 point 10
10 for Needed=2 to 13
15 People=20
20 repeat
25 People=People+1
30 Prob=fnProbSucc(People,Needed)
40 until Prob>=0.5
50 print Needed,People,Prob
90 next
900 end
999 '
1000 fn1dApRinN(N,R)
1010 local P
1020 P=combi(N,R)*(1/365)^R*(364/365)^(N-R)
1030 return(P)
1099 '
1100 fnAtLeastRinN(N,R)
1110 local P
1111 local I
1120 P=1
1130 for I=0 to R-1
1140 P=P-fn1dApRinN(N,I)
1150 next
1160 return(P)
1199 '
1200 fnProbSucc(People,HowMany)
1210 local PAtLeast
1211 local P
1220 PAtLeast=fnAtLeastRinN(People,HowMany)
1250 P=365*PAtLeast-combi(365,2)*PAtLeast^2+combi(365,3)*PAtLeast^3-combi(365
,4)*PAtLeast^4+combi(365,5)*PAtLeast^5
1270 return(P)
and it finds
2 24 0.516892911505135457193064552705280701875673521049
3 88 0.500024238908916008751425954604806345304855138599
4 188 0.50121973720291497376553934602377971816400640167
5 315 0.503716145942477034440763359178102928683134514473
6 462 0.503692442151314437034595863350466252672421876767
7 624 0.50023923568884182819567190051858113365279781384
8 800 0.500195974660549653670527200105639158839275552604
9 987 0.500416377508527652709622327057735734378561391984
10 1183 0.500100393247427978545737350820694555937017652788
11 1388 0.501858680817548576454867836197064871061046632628
12 1599 0.501682818386226699143745804914276827734503416906
13 1816 0.501342696818721244403330383903198902341179454911
so the original simulation program was expanded to up to 2000 people per trial, with the following results:
2 23 .50643
3 87 .50081
4 187 .50653
5 312 .500095
6 460 .503405
7 622 .5020250000000001
8 798 .50124
9 985 .50041
10 1182 .50198
11 1386 .501405
12 1596 .501255
and our approximation calculations agree well with our simulation results.
We had already known the 23 for the case of 2 occurrences. Three occurrences looks like 87 or 88, ... twelve occurrences require something like 1596-1599. The approximation calculation went beyond the original scope and calculated for thirteen occurrences also.
Edited on March 11, 2004, 2:12 pm
|
Posted by Charlie
on 2004-03-11 14:10:44 |