M is an n-digit positive integer composed of a string of random decimal digits (no leading zero) selected from a uniform distribution. For each case below, determine the value of the smallest number of digits associated with a probability of at least 0.5 that M contains:
(A) all of the digits 0 to 9
(B) all of the 2-digit combinations 00, 01, ..., 99
(C) all of the 3-digit combinations 000, 001, ..., 999
As we're not asked to find the exact probability for each number of digits, I feel it's OK to allow sequences of digits that include the possibility of leading zeros. In fact the probability that a sequence of, say, 30 random digits has all 10 digits, given that the first digit is, say, 1, is the same as if given the first digit is a 2, or is zero.
The analytic solution (with calculation aided by computer) is possible only for part A. For parts B and C the lack of independence would make it difficult to give an analytic approach, and so will be done by simulation.
Part A:
This can be done by inclusion/exclusion. That is, the probability that all the digits will be represented is the sum of the probability that each one individually is present, minus the probability that each pair of digits is represented by at least one of its members, plus the probability that each triplet of digits is present by means of at least one of its members, etc.
We don't risk doing this by hand, so:
clearvars
for n=10:35
p0=1-sym(9/10)^n;
p01=1-sym(8/10)^n;
p012=1-sym(7/10)^n;
p0123=1-sym(6/10)^n;
p01234=1-sym(5/10)^n;
p012345=1-sym(4/10)^n;
p0123456=1-sym(3/10)^n;
p01234567=1-sym(2/10)^n;
p012345678=1-sym(1/10)^n;
p0123456789=1;
p1=p0*10;
p2=p01*nchoosek(10,2);
p3=p012*nchoosek(10,3);
p4=p0123*nchoosek(10,4);
p5=p01234*nchoosek(10,5);
p6=p012345*nchoosek(10,6);
p7=p0123456*nchoosek(10,7);
p8=p01234567*nchoosek(10,8);
p9=p012345678*nchoosek(10,9);
p10=p0123456789;
p=p1-p2+p3-p4+p5-p6+p7-p8+p9-p10;
fprintf('%4d %15.13f %s\n',n,eval(p),p)
end
n prob probability as rational
10 0.0003628800000 567/1562500
11 0.0019958400000 6237/3125000
12 0.0061871040000 193347/31250000
13 0.0142702560000 891891/62500000
14 0.0273158645760 26675649/976562500
15 0.0459502243200 143594451/3125000000
16 0.0703098107712 10985907933/156250000000
17 0.1000944296352 31279509261/312500000000
18 0.1346726200083 21042596876301/156250000000000
19 0.1732015476199 54125483631219/312500000000000
20 0.2147373231974 671054134991877/3125000000000000
21 0.2583238656586 1614524160366117/6250000000000000
22 0.3030569819183 236763267123649959/781250000000000000
23 0.3481253462769 271972926778836591/781250000000000000
24 0.3928324119812 122760128744112321/312500000000000000
25 0.4366039317526 54575491469075577/125000000000000000
26 0.4789854653699 1496829579280832563767/3125000000000000000000
27 0.5196335294646 3247709559153727376709/6250000000000000000000
28 0.5583032100726 34893950629539961690731/62500000000000000000000
29 0.5948342776687 74354284708591053185019/125000000000000000000000
30 0.6291371892527 24575671455185092431306663/39062500000000000000000000
31 0.6611798504170 103309351627654392774534981/156250000000000000000000000
32 0.6909756324468 1079649425698155662929610031/1562500000000000000000000000
33 0.7185728740243 2245540231325863045575464463/3125000000000000000000000000
34 0.7440459199736 1162571749958790105933707292969/1562500000000000000000000000000
35 0.7674876384302 479679774018898077792619889283/625000000000000000000000000000
n=27 is the lowest in which the probability of getting all the digits exceeds 0.5.
Parts B and C:
ct=0;
for trial=1:10000
M=num2str(randi(10,[1, 497 ])-1); % 498 seems good FOR 2-DIGIT
M=erase(M,' '); % 7266 seems good FOR 3-DIGIT
good=true; % 27 seems good FOR 1-DIGIT
for s2=0:99
ss=sprintf('%02d',s2);
if isempty(strfind(M,ss))
good=false;
break
end
end
if good
ct=ct+1;
end
end
ct
Runs with 497 and 498 digits find
>> allCombos2a
ct =
4994
>> allCombos2a
ct =
5066
though the variance from 5000 is borderline, so is not real proof, but I did more than just these two runs, including with 100,000 instead of 10,000 trials, and the 498 figure seems solid.
For C:
ct=0;
for trial=1:10000
M=num2str(randi(10,[1, 7266 ])-1); % 498 seems good FOR 2-DIGIT
M=erase(M,' '); % 7266 seems good FOR 3-DIGIT
good=true; % 27 seems good FOR 1-DIGIT
for s2=0:999
ss=sprintf('%03d',s2);
if isempty(strfind(M,ss))
good=false;
break
end
end
if good
ct=ct+1;
end
end
ct
The counts with 7265 and 7266-length strings of digits:
>> allCombos2a
ct =
4911
>> allCombos2a
ct =
5068
Doing the single-digit case this way, with 26- and 27-digit strings:
>> allCombos2a
ct =
4721
>> allCombos2a
ct =
5203
For curiosity I tried excluding leading zeros in the doubles case, by this substitution:
for trial=1:50000
while true
M=num2str(randi(10,[1, 497 ])-1);
if M(1)>'0'
break
end
end
497 and 498 gave
>> allCombos2a
ct =
24735
>> allCombos2a
ct =
25224
Summary:
A) 27
B) 498
C) 7266
|
Posted by Charlie
on 2024-05-10 14:32:53 |