perplexus.info :: Probability : Common Birth Dates II

Common Birth Dates II (Posted on 2021-11-17)

Common Birth Dates asked for the expected number of pairs of common birth dates for a cohort of 1000 people whose ages were clustered around a certain value.

Now consider a similar question to the traditional birthday problem: If you attend a concert or other event where people tend to be of a similar age, how large would such an event need to be in terms of number of attendees, so that you'd have at least a 50% probability that there would be at least two people born on the exact same day (year, month and day)?

Again to make things specific: assume the standard deviation about whatever the mean age to be is 12 years and it follows a normal distribution.

Again feel free to vary the assumptions about the distribution.

See The Solution Submitted by Charlie

Rating: 4.5000 (2 votes)

Comments: ( Back to comment list | You must be logged in to post comments.)

One more examination

| Comment 4 of 6 |

I was looking to understand the discrepancy between my analysis and simulations. In the last post, I blamed it on not considering conditional probability. But, reexamining the results, I have concluded that the analytic approximation is, in fact, very accurate.

In hindsight, it is remarkable that no simulation at all (i.e., one employing a normal distribution random number generator) is needed to solve this problem.

Here are the results for running a simulation of "sample" persons where the sample ranges from 7 to 447. The mean is the fraction of groups that had at least 1 shared birthday. The distribution of ages is P(i), a normal PDF with a mean of 60 years, a sigma of 12 years, and cutoffs at 0 and 120 years, with i being the day number. The distribution was renormalized to integrate to exactly 1.0 even with the tiny 5-sigma cutoff. I ran each sample 30,000 times, repeating 5 times to get a mean and sigma.

My approximation for the probability P of at least 1 birthday for sample n is given by:

P(n) = 1-P(pair)^(n(n-1)/2),

where P(pair)={sum, i=1,m of p(i)(1-p(i)}

and where m is the number of days in 120 years, m=43825.

p(i) is the usual gaussian.

The program is here. In the table below, I compare the simulated mean to the analytic approximation. 1-sigma for the simulation is given in parentheses

sample 7, mean= 0.0014 (0.0001), approx = 0.0013

sample 27, mean= 0.0218 (0.0008), approx = 0.0222

sample 47, mean= 0.0672 (0.0009), approx = 0.0669

sample 67, mean= 0.1342 (0.0024), approx = 0.1321

sample 87, mean= 0.2143 (0.0023), approx = 0.2132

sample 107, mean= 0.3092 (0.0033), approx = 0.3047

sample 127, mean= 0.4027 (0.0022), approx = 0.4011

sample 147, mean= 0.5005 (0.0019), approx = 0.4972

sample 167, mean= 0.5916 (0.0016), approx = 0.5886

sample 187, mean= 0.6750 (0.0004), approx = 0.6719

sample 207, mean= 0.7465 (0.0013), approx = 0.7449

sample 227, mean= 0.8107 (0.0008), approx = 0.8067

sample 247, mean= 0.8597 (0.0021), approx = 0.8573

sample 267, mean= 0.8999 (0.0016), approx = 0.8973

sample 287, mean= 0.9301 (0.0008), approx = 0.9279

sample 307, mean= 0.9522 (0.0004), approx = 0.9507

sample 327, mean= 0.9669 (0.0009), approx = 0.9671

sample 347, mean= 0.9791 (0.0009), approx = 0.9787

sample 367, mean= 0.9865 (0.0007), approx = 0.9865

sample 387, mean= 0.9915 (0.0004), approx = 0.9917

sample 407, mean= 0.9952 (0.0003), approx = 0.9950

Conditional probabilities can help explain why the approximation is a tiny bit low: strictly speaking, successive addition of more people to the sample leaves fewer "unclaimed" birthdays. But this fraction nearly 1.

Maximum fraction = (43825-407)/43825 = 0.99 for the samples above.

I am not sure why my previous simulation (with a different algorithm) gave the high number of 157 for the 50% probability. 147 is correct. Simulations with the new algorithm agree give the smaller values above.

An exact analytic analysis might require considering all possible current populated birthday configurations and ways adding one more person to yield a match. But this might alternatively take a probabilistic form. Perhaps there is some neat recursion trick to generate the equation governing the addition of each new person.

Edited on November 21, 2021, 9:51 pm

Posted by Steven Lord on 2021-11-21 14:58:59

Please log in:

Login:
Password:
Remember me:

Sign up! \| Forgot password

Search:

Search body:

Forums (0)
Newest Problems
Random Problem
FAQ | About This Site
Site Statistics
New Comments (5)
Unsolved Problems
Top Rated Problems
This month's top
Most Commented On

Chatterbox:


perplexus dot info