In doing computer simulations, such as
the one I wrote for simulating the results of part 2 of Rumor Mill, one often uses a random number generator that's built into the computer language. These generators are based on a seed (some arbitrary number) and are mixed up at each step that calls for a random number. There are only a finite number of seeds, and each one is based mechanically on the previous, so they necessarily repeat after a while.
The seed is kept internally, away from the programmer's view, so the programmer can't ask for, say, the next random number after .753372.
If one suspects that the repetition cycle is actually occuring within the length of the run that he needs, what algorithm can you put into the program to find the period with which your results are repeating (and are therefore no longer random, or rather no longer independent trials)? Assume you do not have room to store all the numbers as they arrive, nor can you afford the time it would take to compare each new number to all the preceding numbers.
Then also, how do you determine where the repetition cycle begins (after what iteration of the loop of trials).
There is no general and fool proof solution to the problem you are posing.
A perfect solution would guard against the risk of a false positive as well as always detecting the onset of a repetition. Complete assurance against false positives is impossible, as can be easily seen by considering the performance of such an algorithm when processing a true random number generator - say a device which reports on the beta decay in a radioactive sample. At some point you will see an arbitrarily long repetition, but this is only indicative, not conclusive of the end of a cycle. This difficulty persists even if you remove the constraints and allow the algorithm to save and compare ever number.
Practically speaking, there are more or less good tests that can pronounce judgements of the kind "there is a 96% chance that the cycle began repeating 3 numbers ago". Designing a good test involves tradeoffs regarding e.g. the storage size and computational speed problems you already recognised, as well as the desired tolerance of the user to false positives, the wish to detect quickly, etc. A good test will probably also be based on reasonable guesses about the properties of the random number generating algorithm. No one test will be good for every situation.
|
Posted by FrankM
on 2008-01-19 21:21:25 |