Consider the following game: Roll a fair die of n sides. Continue rolling until some number has been rolled twice. You win if it takes 4 or more additional rolls (5 or more total).
What is the least number of sides the die could have if the expected number of additional rolls required is greater than 4?
The probability that the first number will appear on the very first additional roll is 1/n. That this will not happen, but that one of the first two will appear on the second added roll is (n1)/n * 2/n. That neither of these happens, but a match appears on the third added roll, is (n1)/n * (n2)/n * 3/n, etc.
In computing the expected value of the number of added rolls, each of these mutually exclusive probabilities must be multiplied by the respective series length and then the terms added together. Each term, i, is then i*Sum{j=1 to i1}(nj)/n * i/n.
CLS
DEFDBL AZ
FOR n = 2 TO 20
tot = 0
FOR i = 1 TO n
term = i * i / n
FOR j = 1 TO i  1
term = term * (n  j) / n
NEXT
tot = tot + term
NEXT
IF n >= 4 THEN
prob4 = (n  1) / n * (n  2) / n * (n  3) / n
ELSE
prob4 = 0
END IF
PRINT n, tot; TAB(40); prob4
NEXT
The program above implements the given expected added rolls, and also, for comparison, the probability that the repetition will happen only after at least 4 added rolls, i.e., that the first three added rolls will not produce a match.
n exp. number of prob. of needing
added rolls at least 4 added rolls
2 1.5 0
3 1.888888888888889 0
4 2.21875 .09375
5 2.510400000000001 .192
6 2.774691358024691 .2777777777777778
7 3.018138700711438 .3498542274052478
8 3.245018005371094 .41015625
9 3.458315744885656 .4609053497942387
10 3.66021568 .504
11 3.852372050737359 .540946656649136
12 4.036073675098951 .5729166666666666
13 4.21234791295252 .6008192990441511
14 4.382029424383519 .6253644314868805
15 4.545807285147228 .6471111111111111
16 4.704258247072678 .66650390625
17 4.857870820801628 .683899857520863
18 5.007063098992892 .6995884773662552
19 5.152196200957448 .7138066773582155
20 5.293584586000899 .72675
Note that the answer to the puzzle is that it takes a 12sided die to make the expected number of added rolls to exceed 4.
Further exploration:
But it takes only a 10sided die to give over 50% probability that at least 4 added rolls will be needed.
Simulation verification of formulas used:
A simulation program for the 20sided case verifies that we've used the correct formulas:
DEFDBL AZ
FOR trial = 1 TO 1000000
n = 20
REDIM h(20)
h(0) = INT(RND(1) * n + 1)
fin = 0
roll = 0
DO
roll = roll + 1
r = INT(RND(1) * n + 1)
h(roll) = r
FOR i = 0 TO roll  1
IF r = h(i) THEN fin = 1: EXIT FOR
NEXT
LOOP UNTIL fin
totrolls = totrolls + roll
IF roll >= 4 THEN tot4 = tot4 + 1
PRINT trial, totrolls / trial, tot4 / trial
NEXT
finding the averages after a million trials as 5.296289 and .726962, in statistical agreement with the table entries for n=20.

Posted by Charlie
on 20120525 14:30:40 