All about flooble | fun stuff | Get a free chatterbox | Free JavaScript | Avatars
 perplexus dot info

 Happy Birthday (2) (Posted on 2004-03-11)
Remember this one?

Well, this time, the question is:
Assuming that birthdays are evenly distributed around 365 days of the year...
what is the minimum number of people I must have in a room, such that the odds are that at least n people share the same birthday?

Let's limit this question to n values from 1 to 12.

We know that for n=1, 1 person is sufficient.

For n=2, as is described in Happy Birthday, 23 people are sufficient.

What are the minimum numbers for n=3 to 12?

 See The Solution Submitted by SilverKnight Rating: 2.6667 (3 votes)

Comments: ( Back to comment list | You must be logged in to post comments.)
 Simulations and approximation | Comment 1 of 5

The situation was simulated by

DEFDBL A-Z
DIM noDays(12, 600)
trials = 200000
FOR trial = 1 TO trials
REDIM bday(365)
maxRep = 0
FOR people = 1 TO 600
bd = INT(RND(1) * 365 + 1)
bday(bd) = bday(bd) + 1
IF bday(bd) > maxRep THEN
maxRep = bday(bd)
IF maxRep < 13 THEN
noDays(maxRep, people) = noDays(maxRep, people) + 1
END IF
END IF
NEXT
NEXT
FOR rep = 2 TO 12
ct = 0
FOR i = 1 TO 600
ct = ct + noDays(rep, i)
IF ct >= trials / 2 THEN PRINT rep, i, ct / trials: EXIT FOR
NEXT
NEXT

for 200,000 trials, producing

2             23            .50595
3             88            .51103
4             186           .50154
5             313           .504695
6             460           .50365

so apparently, likelihood of 7 or more repetitions of a birthday requires more than the 600 allowed for.

But with this information we have enough to check out answers that we get from more theoretic analysis, and then use that theoretic analysis to determine what size array to allow for.

The probability that one given birthday will appear exactly r times among n people is p(n,r) =  C(n,r)*(1/365)^r*(364/365)^(n-r).

Then the probability that it appears at least r times among n people is P(n,r) = 1 - p(n,0) - p(n,1) - ... - p(n,r-1). (That's a capital P defined in terms of small p, in case the fonts don't make that clear.)

The difficulty is then finding the probability that any birthday occurs at least r times among n people.  If the events were mutually exclusive, we could add the probabilities (that is just multiply the probability for a given birthday by 365). But they are not mutually exclusive--more than one date might occur r or more times.  To allow for this, we could use inclusion/exclusion to subtract out the pairwise ANDed conditions (of two birthdays both being represented), as each had been counted though only one represented a new "success".  Defining now P(m,n,r) as the event of m different dates each having at least r occurrences among n people, with P(1,n,r) being just the previously defined P(n,r), the overall sought probability is C(365,1)*P(1,n,r) - C(365,2)*P(2,n,r) + C(365,3)*P(3,n,r) - C(365,4)*P(4,n,r) + ... , where the first term is the oversimplified (exclusivity-assumed) 365*P(n,r).

The problem is that these pairwise (and higher multiple-wise) probabilities are hard to calculate.  However, we can make an approximation by assuming that they are independent (though they are also not actually independent).   For example, the probability that January 1 is present among a group of 50 people is 1-(364/365)^50 = 0.1281817426688633092, as is the probability that January 2 is present.  The probability that both are present is 0.0161437857620377809.  If we assumed their presences were independent, we'd have multiplied the individual probabilities (i.e., squared the probability of one), and used 0.0164305591536266927, which is not very different, but is indeed different.  The exact calculation was possible for this any-occurrence case, but becomes more difficult for multiple-occurrence cases.

The following program calculates the probabilities as an approximation by treating the compound conditions as independent and carrying the inclusion/exclusion to the possibility of 5 different dates meeting the criteria:

5   point 10
10   for Needed=2 to 13
15    People=20
20    repeat
25     People=People+1
30     Prob=fnProbSucc(People,Needed)
40    until Prob>=0.5
50   print Needed,People,Prob
90   next
900   end
999   '
1000   fn1dApRinN(N,R)
1010     local P
1020     P=combi(N,R)*(1/365)^R*(364/365)^(N-R)
1030   return(P)
1099   '
1100   fnAtLeastRinN(N,R)
1110     local P
1111     local I
1120     P=1
1130     for I=0 to R-1
1140       P=P-fn1dApRinN(N,I)
1150     next
1160    return(P)
1199   '
1200   fnProbSucc(People,HowMany)
1210   local PAtLeast
1211   local P
1220   PAtLeast=fnAtLeastRinN(People,HowMany)
1250   P=365*PAtLeast-combi(365,2)*PAtLeast^2+combi(365,3)*PAtLeast^3-combi(365
,4)*PAtLeast^4+combi(365,5)*PAtLeast^5
1270   return(P)

and it finds

2       24      0.516892911505135457193064552705280701875673521049
3       88      0.500024238908916008751425954604806345304855138599
4       188     0.50121973720291497376553934602377971816400640167
5       315     0.503716145942477034440763359178102928683134514473
6       462     0.503692442151314437034595863350466252672421876767
7       624     0.50023923568884182819567190051858113365279781384
8       800     0.500195974660549653670527200105639158839275552604
9       987     0.500416377508527652709622327057735734378561391984
10      1183    0.500100393247427978545737350820694555937017652788
11      1388    0.501858680817548576454867836197064871061046632628
12      1599    0.501682818386226699143745804914276827734503416906
13      1816    0.501342696818721244403330383903198902341179454911

so the original simulation program was expanded to up to 2000 people per trial, with the following results:

2             23            .50643
3             87            .50081
4             187           .50653
5             312           .500095
6             460           .503405
7             622           .5020250000000001
8             798           .50124
9             985           .50041
10            1182          .50198
11            1386          .501405
12            1596          .501255

and our approximation calculations agree well with our simulation results.

We had already known the 23 for the case of 2 occurrences.  Three occurrences looks like 87 or 88, ... twelve occurrences require something like 1596-1599.  The approximation calculation went beyond the original scope and calculated for thirteen occurrences also.

Edited on March 11, 2004, 2:12 pm
 Posted by Charlie on 2004-03-11 14:10:44

 Search: Search body:
Forums (0)