Suppose someone comes onto the site and is bored. That person starts spreading the rumor that levik is a monkey at exactly noon (on day 1) by sending an e-mail to one random person.
Then, each person sends an e-mail about this rumor (at exactly noon, on day 2) to one person. They can send it to anyone on perplexus (but themselves), even if that person already knows the rumor, or even if it was the person who told them about it.
Each successive day at noon, everyone that knows about the rumor sends a message to one other random person.
On average, on what day will everyone know about the rumor if there are 40 people (including the one that spread the rumor initially) at perplexus while the rumor is still spreading?
What if there were x people at perplexus while the rumor is still spreading?
This problem has some similarity to The Random Problem, except in that instance only one person was seeking out a new item (in that case problem, in the present case person), while now each already-selected person chooses another person at random. In "The Random Problem" we produced a Markov chain of seen vs unseen problems, calculating the transition probabilities (having seen x, transitioning to having seen x+1) as we went along.
The transition probabilities (conditional probabilities based on how many already have heard the rumor) for the present case are complicated enough that we should compute them ahead of time and save them in an array. (Oh, yes, did I mention: a computer is involved in the solution.)
Each time a given person selects someone to tell, the probability of the number having heard the rumor increasing by one is (40 - p)/39, where p is the number of people now knowing the rumor. This must be iterated as many times as the number of people who, starting that day, already knew the rumor, but on each iteration we do have to keep track of the current full number of people who know the rumor.
In the program, the array is kept as the row being the number already knowing the rumor and the column being the number knowing the rumor after the day is over. However, for brevity, in the following table (to much less precision than is really kept), the first probability on each line represents the probability of the number staying the same (x -> x); the next column has the probability of increasing by 1; then by 2, etc.:
2 0.001 0.075 0.924
3 0.000 0.012 0.202 0.786
4 0.000 0.003 0.053 0.333 0.611
5 0.000 0.001 0.017 0.133 0.418 0.432
6 0.000 0.000 0.006 0.057 0.234 0.427 0.275
7 0.000 0.000 0.003 0.027 0.130 0.317 0.366 0.157
8 0.000 0.000 0.001 0.014 0.076 0.220 0.343 0.266 0.079
9 0.000 0.000 0.001 0.008 0.047 0.153 0.287 0.304 0.165 0.035
10 0.000 0.000 0.001 0.005 0.031 0.110 0.233 0.299 0.221 0.086 0.013
11 0.000 0.000 0.000 0.004 0.022 0.082 0.190 0.277 0.249 0.133 0.038 0.004
12 0.000 0.000 0.000 0.003 0.017 0.065 0.158 0.251 0.258 0.167 0.065 0.014 0.001
13 0.000 0.000 0.000 0.002 0.014 0.053 0.136 0.229 0.257 0.190 0.090 0.026 0.004
14 0.000 0.000 0.000 0.002 0.012 0.046 0.120 0.210 0.251 0.203 0.109 0.038 0.008
15 0.000 0.000 0.000 0.002 0.011 0.042 0.110 0.197 0.244 0.209 0.123 0.049 0.012
16 0.000 0.000 0.000 0.002 0.010 0.039 0.104 0.188 0.238 0.212 0.131 0.056 0.016
17 0.000 0.000 0.000 0.002 0.010 0.039 0.101 0.183 0.234 0.211 0.135 0.061 0.019
18 0.000 0.000 0.000 0.002 0.011 0.040 0.102 0.182 0.231 0.209 0.136 0.062 0.020
19 0.000 0.000 0.000 0.002 0.012 0.042 0.105 0.184 0.230 0.206 0.133 0.061 0.020
20 0.000 0.000 0.000 0.003 0.013 0.046 0.111 0.189 0.230 0.201 0.126 0.057 0.018
21 0.000 0.000 0.000 0.003 0.016 0.052 0.121 0.197 0.230 0.193 0.117 0.051 0.016
22 0.000 0.000 0.001 0.004 0.019 0.060 0.133 0.206 0.229 0.183 0.105 0.043 0.012
23 0.000 0.000 0.001 0.005 0.024 0.071 0.148 0.217 0.227 0.170 0.091 0.034 0.009
24 0.000 0.000 0.001 0.007 0.030 0.086 0.167 0.228 0.221 0.153 0.074 0.025 0.006
25 0.000 0.000 0.002 0.010 0.040 0.104 0.188 0.237 0.210 0.131 0.057 0.017 0.003
26 0.000 0.000 0.002 0.014 0.053 0.128 0.211 0.242 0.192 0.106 0.040 0.010 0.002
27 0.000 0.000 0.004 0.021 0.071 0.156 0.233 0.239 0.167 0.079 0.024 0.005 0.000
28 0.000 0.001 0.006 0.031 0.096 0.190 0.252 0.225 0.134 0.052 0.012 0.002 0.000
29 0.000 0.001 0.011 0.047 0.129 0.226 0.260 0.197 0.096 0.029 0.005 0.000
30 0.000 0.002 0.018 0.071 0.171 0.260 0.253 0.155 0.058 0.012 0.001
31 0.000 0.005 0.030 0.106 0.221 0.282 0.223 0.104 0.026 0.003
32 0.001 0.009 0.052 0.156 0.272 0.281 0.168 0.053 0.007
33 0.001 0.018 0.088 0.222 0.312 0.244 0.098 0.016
34 0.003 0.036 0.147 0.298 0.316 0.166 0.034
35 0.008 0.072 0.236 0.360 0.256 0.068
36 0.020 0.143 0.351 0.358 0.128
37 0.052 0.273 0.447 0.229
38 0.135 0.475 0.390
39 0.363 0.637
40 1.000
I've truncated the right side so as not to upset by too much the formatting on perplexus.
As a sanity check on the above:
If there are already 2 knowers, the probability they each choose each other is (1/39)^2 = 0.0006574621959, which rounds to .001. The probability they each choose the same other person is (38/39)*(1/39) = 0.0249835634451; but there's the same probability that the first person will choose a new person and the second person will choose the first, or vice versa. These three equal number add up to the .075 shown (to the given precision).
On the other end, the probability that the lone non-hearer will be contacted by none of 39 knowers is (38/39)^39 = 0.3631119936942, which value is indeed the one stored internally for the column representing 39 staying 39 the next day.
Here's a probability distribution of the number of knowers by day:
(days 5-7 have been truncated at the right.)
2
1 1.000
2 3 4
2 0.001 0.075 0.924
4 5 6 7 8
3 0.002 0.018 0.108 0.308 0.565
7 8 9 10 11 12 13 14 15 16
4 0.001 0.004 0.014 0.042 0.094 0.170 0.237 0.242 0.150 0.045
12 13 14 15 16 17 18 19 20 21 22 23 24
5 0.001 0.002 0.005 0.012 0.026 0.048 0.079 0.115 0.147 0.162 0.151 0.118 0.075
19 20 21 22 23 24 25 26 27 28 29 30 31
6 0.001 0.002 0.005 0.010 0.019 0.033 0.055 0.082 0.111 0.134 0.145 0.136 0.111
26 27 28 29 30 31 32 33 34 35 36 37 38
7 0.001 0.003 0.006 0.013 0.027 0.049 0.082 0.122 0.159 0.176 0.160 0.115 0.061
31 32 33 34 35 36 37 38 39 40
8 0.001 0.003 0.008 0.022 0.054 0.113 0.196 0.260 0.235 0.107
34 35 36 37 38 39 40
9 0.001 0.003 0.012 0.049 0.158 0.355 0.422
36 37 38 39 40
10 0.001 0.005 0.040 0.231 0.723
38 39 40
11 0.007 0.106 0.887
38 39 40
12 0.001 0.042 0.957
39 40
13 0.016 0.984
39 40
14 0.006 0.994
39 40
15 0.002 0.998
39 40
16 0.001 0.999
40
17 1.000
40
18 1.000
...
40
42 1.000
40
43 1.000
40
44 1.000
where the numbers at the left are the day numbers and the number above each probability being the possible number of people knowing the rumor. Probabilities less than .0005 are not shown. The 1.000 probability of 40 is carried out to day 44 as it takes until then for the probability to reach 1.00000000000000 to 15 significant figures.
The program, keeping track of the expected day on which all 40 know the rumor, comes up with day 9.925828356622571.
The program producing these results is:
CLS
DEFDBL A-Z
OPEN "proboutp.txt" FOR OUTPUT AS #2
totPeople = 40
maxSub = totPeople + 1
DIM recd(maxSub), newRow(maxSub)
'recd contains the prob that that many people have received the rumor
DIM transit(maxSub, maxSub)
' transit(x,y) contains the transition probability that from one day when
' x people have seen the rumor, the next day y people will have seen it.
FOR tranRow = 1 TO totPeople
REDIM person(1, maxSub)
person(0, tranRow) = 1 ' prob = 1 of this number as that is the given
FOR i = 1 TO tranRow 'each person already w/rumor has a chance to spread
FOR j = tranRow TO totPeople ' possibilities
IF j > 2 * tranRow THEN EXIT FOR ' can't be more than doubled
person(1, j + 1) = person(1, j + 1) + person(0, j) * (totPeople - j) / (totPeople - 1)
person(1, j) = person(1, j) + person(0, j) * (j - 1) / (totPeople - 1)
NEXT j
FOR j = tranRow TO totPeople
person(0, j) = person(1, j)
person(1, j) = 0 'get ready for next generation (person's choice)
NEXT j
NEXT i
FOR j = tranRow TO totPeople
transit(tranRow, j) = person(0, j)
NEXT j
NEXT tranRow
FOR x = 1 TO totPeople
PRINT #2, USING "##"; x;
FOR y = x TO totPeople
IF y > 2 * x THEN EXIT FOR
PRINT #2, USING "##.###"; transit(x, y);
NEXT
PRINT #2,
totline = totline + 1
NEXT
recd(1) = 1
expectation = 1
day = 0
DO
day = day + 1
FOR i = 1 TO totPeople
FOR j = i TO totPeople
newRow(j) = newRow(j) + recd(i) * transit(i, j)
NEXT
NEXT
FOR j = 1 TO totPeople
recd(j) = newRow(j)
newRow(j) = 0
NEXT j
FOR i = 0 TO totPeople
IF recd(i) >= .0005 THEN
PRINT #2, USING " ##"; i;
END IF
NEXT
PRINT #2,
PRINT #2, USING "##"; day;
FOR i = 0 TO totPeople
IF recd(i) >= .0005 THEN
PRINT #2, USING " #.###"; recd(i);
END IF
NEXT
PRINT #2, : PRINT #2,
expectation = expectation + 1 - recd(40)
LOOP UNTIL ABS(recd(40) - 1) < 1E-15
PRINT #2, expectation
CLOSE
A similar program finds the following for various numbers of people who belong to the group:
2 1.000000000
3 2.333333333
4 3.190789474
5 3.933115646
6 4.484990456
7 4.969990399
8 5.377911218
9 5.731734652
10 6.047132263
20 8.041985614
30 9.155141733
40 9.925828357
50 10.514640503
60 10.990574356
70 11.389952004
80 11.733712559
90 12.035541692
100 12.304523881
110 12.546969522
120 12.767661160
130 12.970249630
140 13.157468101
150 13.331421386
160 13.493828416
170 13.646139475
180 13.789570366
190 13.925120576
200 14.053606496
As the number of people increases, the day number increases even more slowly than a logarithmic function: When the number of people doubles from 2 to 4, the day number increases by 2.19; when the number of people doubles from 5 to 10, the day number increases by 2.11; when the number of people doubles from 10 to 20, the day number increases by 1.99; from 50 to 100, by 1.79; from 100 to 200, by 1.75. Perhaps it's getting closer to matching a logarithmic curve; however, after 200 the array becomes too large to fit in the memory allocated to DOS.
|
Posted by Charlie
on 2004-04-30 14:29:03 |