Suppose you have a function (or a magic ball) that is capable of producing a totally random integer between 1 and 5 (inclusive).
Using this, how would you generate a random number (also an integer) between 1 and 7 (inclusive)? (Note that the for the number to be random, all integers between 1 and 7 must have an equal chance of being generated)
Assume that using your 15 generator is pretty timeconsuming, so you want to minimize the number of times you are going to use it.
The best way to consider sequences of 5 possibilities or 7 possibilities is to consider them as base5 or base7 numbers. In order to do so, we'll reduce the 5possibility random digit by 1, to get 0  4, and after converting to base7, increment each digit by 1 to get the desired 1  7.
No power of 5 is an integral multiple of a power of 7 because any number is factorable uniquely into primes. So there is always going to be some loss of information. For example, if we take two random numbers from 15 (or a twodigit base5 number), to get 25 possibilities, that is not even enough for a twodigit base7 number, so if we express the result as a base5 number, 024, and convert to base7, we'll get, in that base, a number from 033 (base 7). If we were to just take the last digit, 0 through 3 would be overrepresented, any such digit appearing 4 times for every 3 times that a digit such as 4 through 6 appears. So we'd have to throw away numbers greater than 26 (base 7), which is 20 in decimal, or 40 in base5, then take the last base7 digit.
As the rand5 function, as I'll call it, is expensive to call, we want to minimize losses where we have to throw out a fraction of such results.
Ideally, if there were no losses due to the incommensurability of base5 and base7 numbers, we'd get log(5)/log(7) base7 digits for every base5 digit. That's about .8271, or about 83% as many random digits from 06 (or 17) as from 04 (or 15). That's because a base7 digit has more information than a base5 number.
So there are two items we need to take into consideration in deciding how many base5 digits we should group together: the number of base7 digits we get for every group of n base5 digits, and how many whole such groups we must throw away to avoid a bias in favor of some digits. We saw above that if we use just two base5 digits, we only get one base7 digit, but we also must throw away 4/25 or 16% of our entire throughput, so we actually get only .5*.84 = 42% as many base7 digits as we require base5 digits.
The following table shows the results for various choices in the grouping of base5 digits:
2 1 25 7 21 0.84000 0.42000
3 2 125 49 98 0.78400 0.52267
4 3 625 343 343 0.54880 0.41160
5 4 3125 2401 2401 0.76832 0.61466
6 4 15625 2401 14406 0.92198 0.61466
7 5 78125 16807 67228 0.86052 0.61466
8 6 390625 117649 352947 0.90354 0.67766
9 7 1953125 823543 1647086 0.84331 0.65591
10 8 9765625 5764801 5764801 0.59032 0.47225
11 9 48828125 40353607 40353607 0.82644 0.67618
12 9 244140625 40353607 242121642 0.99173 0.74380
13 10 1220703125 282475249 1129900996 0.92561 0.71201
14 11 6103515625 1977326743 5931980229 0.97190 0.76363
15 12 30517578125 13841287201 27682574402 0.90710 0.72568
16 13 152587890625 96889010407 96889010407 0.63497 0.51591
17 14 762939453125 678223072849 678223072849 0.88896 0.73209
18 14 3814697265625 678223072849 3391115364245 0.88896 0.69141
19 15 19073486328125 4747561509943 18990246039772 0.99564 0.78603
20 16 95367431640625 33232930569601 66465861139202 0.69695 0.55756
21 17 476837158203125 232630513987207 465261027974414 0.97572 0.78987
In the various columns:
The number of base5 digits that go in, the resulting number of base7 digits that result, five raised to the given power, and seven raised to the power of the number of base7 digits, followed by the largest integral multiple of that power of 7 that is not greater than the power of 5 shown. The fraction of that over the number that is the power of 5 is the portion of throughput that need not be thrown awaywe just throw away the first base7 digitis shown next. Finally is shown the ratio of base7 to base5 digits multiplied by the fraction kept.
In the twobase5digit example, we saw we can get only one base7 digit, as there are only 25 possible twobase5digit results. Then 7^1 is of course 7, but three times this (or 21) will fit into 25, so it doesn't matter we throw away the first base7 digit (02). But as a result we keep only 21/25 of our throughput, or 84%, and 84% of 50% is the 42% we figured earlier.
If we choose to accumulate 21 base5 digits, we'll get 17 base7 digits, and as you can see from the last column, get almost 79% as many base7 digits as base5 go in, not too far from the theoretic 82.7% of the ideal case. Going to 23 base5 digits would give us 78.991% instead of the 78.987 we get at 21, but accumulating such large numbers would exceed the precision available in the language I used.
The subroutine is:
DEFDBL AZ
FUNCTION rand7
STATIC seed
IF seed = 0 THEN
DO
seed = 0
FOR i = 1 TO 21
seed = seed * 5 + rand5  1
NEXT
LOOP WHILE seed > 465261027974413#
DO WHILE seed > 232630513987206#
seed = seed  232630513987207#
LOOP
END IF
q = INT(seed / 7)
r = seed  q * 7 + 1
seed = q
rand7 = r
END FUNCTION
Note the STATIC seed.
It keeps this valuethe built base5 number (actually stored in the internal binary format of the computer, but conceptually just a number)from call to call to this function, until it has been depleted. The first DO...LOOP WHILE throws away seeds greater than 2*7^211 (the subtraction of one as we're zerobased). The second DO WHILE...LOOP in effect gets rid of the first base7 digit.

Posted by Charlie
on 20030630 08:52:22 