A regatta is a series of sailboat races. In the fleet where I race, the regatta winner is determined using a method called "Low Point Scoring". In any given race, the 1st place boat gets 1 point, the second place boat gets 2 points, the nth place boat receives n points. Individual races never have ties for any positions. The overall regatta is won by the boat with the lowest total number of points for all races. (If there is a tie for lowest total points, then the regatta is won by whichever of the tied boats had the better performance in the last race).
Consider a relatively small fleet of only 4 boats, each of which is equally likely to win any given race.
a) If there are only two races, the boat that wins the regatta will have a score of 2, 3, 4 or possibly even 5. What is the expected value of the winning score?
b) If there are three races, what is the expected value of the winning score? (I found even this simple case hard to calculate exactly, and I am hoping that somebody will come up with a better method than mine. And yes, I know that it is easy to simulate.)
c) If there is a large number of races, how might I approximate the expected winning score? (Among other things, I think I'd welcome a simulation here)
Any given regatta will have 1st, 2nd, 3rd and 4th place winners for the first race. The ultimate total of the winning boat will depend on how these boats finish in the remaining n-1 races. There are 24 equally likely orders of finish in each of the remaining races, identifying each boat by its order of finish in the first race.
If there are 2 races, then, there are 24 distinguishable outcomes, based on how the place-farers in the first race do in the second.
If there are 3 races, there are 24*24 = 576 distinguishable outcomes. With 4 races there would be 24^3 = 13824 distinguishable outcomes. ... etc.
The following table lists, per number of races: the expected value of the winning score, the expected value divided by the number of races (sort of a per-race average), the raw number of points accumulated over all possible equally likely outcomes, the number of those equally likely outcomes, and the latter two numbers reduced by their gcd:
2 3.125 1.5625 75 24 25 8
3 5.170138888888889 1.72337962962963 2978 576 1489 288
4 7.327690972222222 1.831922743055556 101298 13824 16883 2304
5 9.513979311342593 1.902795862268519 3156510 331776 526085 55296
6 11.73184618537809 1.955307697563014 93416280 7962624 3892345 331776
7 13.9726596199109 1.99609423141584 2670216836 191102976 667554209 47775744
so for example, for 3 races there are 24^2 = 576 possible outcomes (relative to the order of the first race's outcome), and the total of the winners scores was 2978, for an average winner's score of 2978/576 = 1489/288, or 5.17013888888... or 1.72337962962962... per race average.
The program producing this output was
DECLARE FUNCTION gcd# (a#, b#)
DECLARE SUB race ()
DECLARE SUB permute (a$)
CLEAR , , 25000
DEFDBL A-Z
CLS
DIM SHARED t(4), tt, n, lvl, races
FOR i = 1 TO 4
t(i) = i
NEXT
FOR races = 2 TO 6
lvl = 1: tt = 0: n = 0
race
g = gcd(tt, n)
PRINT races; tt / n; TAB(23); tt / n / races; TAB(42); tt; n; TAB(62); tt / g; n / g
NEXT
FUNCTION gcd (a, b)
x = a: y = b
DO
q = INT(x / y)
r = x - y * q
x = y: y = r
LOOP UNTIL r = 0
gcd = x
END FUNCTION
. . .
DEFDBL A-Z
SUB race
STATIC ctr
lvl = lvl + 1
a$ = "1234": h$ = a$
DO
FOR i = 1 TO 4
t(i) = t(i) + VAL(MID$(a$, i, 1))
NEXT
IF lvl = races THEN
low = 99999999
FOR i = 1 TO 4
IF t(i) < low THEN low = t(i)
NEXT
tt = tt + low
n = n + 1
ELSE
race
END IF
FOR i = 1 TO 4
t(i) = t(i) - VAL(MID$(a$, i, 1))
NEXT
permute a$
LOOP UNTIL a$ = h$
lvl = lvl - 1
END SUB
The permute function is shown elsewhere on the site. The result for 7 races used the equivalent Visual Basic program as the Quick Basic program was running too slowly.
Note that the average points for the winner per race can't exceed 2.5, so it's possible that 2.5 is the limit reached asymptotically, so the overall expected value might approach, but not reach 2.5 * n, where n is the number of races.
The following program simulates a larger number of races:
DEFDBL A-Z
RANDOMIZE TIMER
FOR races = 2 TO 7: GOSUB test: NEXT
races = 10: GOSUB test
races = 20: GOSUB test
races = 50: GOSUB test
races = 100: GOSUB test
END
test:
tt = 0: n = 0
FOR trial = 1 TO 100000
FOR i = 1 TO 4
t(i) = i
s(i) = i
NEXT
FOR race = 2 TO races
FOR i = 1 TO 3
r = i + INT(RND(1) * (5 - i))
IF r > i THEN
SWAP s(i), s(r)
END IF
NEXT
FOR i = 1 TO 4
t(i) = t(i) + s(i)
NEXT
NEXT race
lowest = 999999999
FOR i = 1 TO 4
IF t(i) < lowest THEN lowest = t(i)
NEXT
tt = tt + lowest
n = n + 1
NEXT trial
PRINT races; TAB(5); tt / n; TAB(30); tt / n / races
RETURN
For comparison with the exact figures, it simulates n=2 through 7, then the higher numbers:
average total avg. total per race
2 3.12473 1.562365
3 5.1698 1.723266666666667
4 7.32807 1.8320175
5 9.51294 1.902588
6 11.7317 1.955283333333333
7 13.97819 1.996884285714286
10 20.79182 2.079182
20 44.05422 2.202711
50 115.61174 2.3122348
100 236.69128 2.3669128
|
Posted by Charlie
on 2005-11-28 10:58:21 |