All about flooble | fun stuff | Get a free chatterbox | Free JavaScript | Avatars    
perplexus dot info

Home > Probability
Happy Birthday (2) (Posted on 2004-03-11) Difficulty: 5 of 5
Remember this one?

Well, this time, the question is:
Assuming that birthdays are evenly distributed around 365 days of the year...
what is the minimum number of people I must have in a room, such that the odds are that at least n people share the same birthday?

Let's limit this question to n values from 1 to 12.

We know that for n=1, 1 person is sufficient.

For n=2, as is described in Happy Birthday, 23 people are sufficient.

What are the minimum numbers for n=3 to 12?

See The Solution Submitted by SilverKnight    
Rating: 2.6667 (3 votes)

Comments: ( Back to comment list | You must be logged in to post comments.)
Some Thoughts Simulations and approximation | Comment 1 of 6

The situation was simulated by

DEFDBL A-Z
DIM noDays(12, 600)
trials = 200000
FOR trial = 1 TO trials
 REDIM bday(365)
 maxRep = 0
 FOR people = 1 TO 600
  bd = INT(RND(1) * 365 + 1)
  bday(bd) = bday(bd) + 1
  IF bday(bd) > maxRep THEN
   maxRep = bday(bd)
   IF maxRep < 13 THEN
    noDays(maxRep, people) = noDays(maxRep, people) + 1
   END IF
  END IF
 NEXT
NEXT
FOR rep = 2 TO 12
  ct = 0
  FOR i = 1 TO 600
    ct = ct + noDays(rep, i)
    IF ct >= trials / 2 THEN PRINT rep, i, ct / trials: EXIT FOR
  NEXT
NEXT

for 200,000 trials, producing

 2             23            .50595
 3             88            .51103
 4             186           .50154
 5             313           .504695
 6             460           .50365

so apparently, likelihood of 7 or more repetitions of a birthday requires more than the 600 allowed for.

But with this information we have enough to check out answers that we get from more theoretic analysis, and then use that theoretic analysis to determine what size array to allow for.

The probability that one given birthday will appear exactly r times among n people is p(n,r) =  C(n,r)*(1/365)^r*(364/365)^(n-r).

Then the probability that it appears at least r times among n people is P(n,r) = 1 - p(n,0) - p(n,1) - ... - p(n,r-1). (That's a capital P defined in terms of small p, in case the fonts don't make that clear.)

The difficulty is then finding the probability that any birthday occurs at least r times among n people.  If the events were mutually exclusive, we could add the probabilities (that is just multiply the probability for a given birthday by 365). But they are not mutually exclusive--more than one date might occur r or more times.  To allow for this, we could use inclusion/exclusion to subtract out the pairwise ANDed conditions (of two birthdays both being represented), as each had been counted though only one represented a new "success".  Defining now P(m,n,r) as the event of m different dates each having at least r occurrences among n people, with P(1,n,r) being just the previously defined P(n,r), the overall sought probability is C(365,1)*P(1,n,r) - C(365,2)*P(2,n,r) + C(365,3)*P(3,n,r) - C(365,4)*P(4,n,r) + ... , where the first term is the oversimplified (exclusivity-assumed) 365*P(n,r).

The problem is that these pairwise (and higher multiple-wise) probabilities are hard to calculate.  However, we can make an approximation by assuming that they are independent (though they are also not actually independent).   For example, the probability that January 1 is present among a group of 50 people is 1-(364/365)^50 = 0.1281817426688633092, as is the probability that January 2 is present.  The probability that both are present is 0.0161437857620377809.  If we assumed their presences were independent, we'd have multiplied the individual probabilities (i.e., squared the probability of one), and used 0.0164305591536266927, which is not very different, but is indeed different.  The exact calculation was possible for this any-occurrence case, but becomes more difficult for multiple-occurrence cases.

The following program calculates the probabilities as an approximation by treating the compound conditions as independent and carrying the inclusion/exclusion to the possibility of 5 different dates meeting the criteria:

    5   point 10
   10   for Needed=2 to 13
   15    People=20
   20    repeat
   25     People=People+1
   30     Prob=fnProbSucc(People,Needed)
   40    until Prob>=0.5
   50   print Needed,People,Prob
   90   next
  900   end
  999   '
 1000   fn1dApRinN(N,R)
 1010     local P
 1020     P=combi(N,R)*(1/365)^R*(364/365)^(N-R)
 1030   return(P)
 1099   '
 1100   fnAtLeastRinN(N,R)
 1110     local P
 1111     local I
 1120     P=1
 1130     for I=0 to R-1
 1140       P=P-fn1dApRinN(N,I)
 1150     next
 1160    return(P)
 1199   '
 1200   fnProbSucc(People,HowMany)
 1210   local PAtLeast
 1211   local P
 1220   PAtLeast=fnAtLeastRinN(People,HowMany)
 1250   P=365*PAtLeast-combi(365,2)*PAtLeast^2+combi(365,3)*PAtLeast^3-combi(365
,4)*PAtLeast^4+combi(365,5)*PAtLeast^5
 1270   return(P)

and it finds

 2       24      0.516892911505135457193064552705280701875673521049
 3       88      0.500024238908916008751425954604806345304855138599
 4       188     0.50121973720291497376553934602377971816400640167
 5       315     0.503716145942477034440763359178102928683134514473
 6       462     0.503692442151314437034595863350466252672421876767
 7       624     0.50023923568884182819567190051858113365279781384
 8       800     0.500195974660549653670527200105639158839275552604
 9       987     0.500416377508527652709622327057735734378561391984
 10      1183    0.500100393247427978545737350820694555937017652788
 11      1388    0.501858680817548576454867836197064871061046632628
 12      1599    0.501682818386226699143745804914276827734503416906
 13      1816    0.501342696818721244403330383903198902341179454911

so the original simulation program was expanded to up to 2000 people per trial, with the following results:

 2             23            .50643
 3             87            .50081
 4             187           .50653
 5             312           .500095
 6             460           .503405
 7             622           .5020250000000001
 8             798           .50124
 9             985           .50041
 10            1182          .50198
 11            1386          .501405
 12            1596          .501255

and our approximation calculations agree well with our simulation results.

We had already known the 23 for the case of 2 occurrences.  Three occurrences looks like 87 or 88, ... twelve occurrences require something like 1596-1599.  The approximation calculation went beyond the original scope and calculated for thirteen occurrences also.

Edited on March 11, 2004, 2:12 pm
  Posted by Charlie on 2004-03-11 14:10:44

Please log in:
Login:
Password:
Remember me:
Sign up! | Forgot password


Search:
Search body:
Forums (1)
Newest Problems
Random Problem
FAQ | About This Site
Site Statistics
New Comments (6)
Unsolved Problems
Top Rated Problems
This month's top
Most Commented On

Chatterbox:
Copyright © 2002 - 2024 by Animus Pactum Consulting. All rights reserved. Privacy Information