The large sample size from the second box gives a greater probability that that one has the majority red marbles, and that the proportion from the first box was a statistical fluke.
Certainly any number of marbles in each box that's less than 30 would disallow 20 red and 10 white from being drawn from either box, and any number less than 60 would preclude 20 reds from being drawn from the majority white box.
For larger numbers of marbles, the following table holds:
n prob of both probability
observations b has more reds
given A has|given B has given the
more reds | more reds observations
30 0.000000000 0.029472443 1.000000000
33 0.000000000 0.014246340 1.000000000
36 0.000000000 0.011346332 1.000000000
39 0.000000000 0.010032430 1.000000000
42 0.000000000 0.009270018 1.000000000
45 0.000000000 0.008768876 1.000000000
48 0.000000000 0.008413313 1.000000000
51 0.000000000 0.008147555 1.000000000
54 0.000000000 0.007941237 1.000000000
57 0.000000000 0.007776355 1.000000000
60 0.000000002 0.007641534 0.999999686
63 0.000000012 0.007529226 0.999998406
66 0.000000035 0.007434220 0.999995330
69 0.000000076 0.007352801 0.999989633
72 0.000000141 0.007282248 0.999980613
75 0.000000233 0.007220521 0.999967753
78 0.000000353 0.007166061 0.999950726
81 0.000000503 0.007117658 0.999929385
84 0.000000681 0.007074354 0.999903728
87 0.000000887 0.007035385 0.999873870
90 0.000001120 0.007000130 0.999840008
93 0.000001377 0.006968084 0.999802400
96 0.000001656 0.006938828 0.999761335
99 0.000001956 0.006912012 0.999717126
102 0.000002273 0.006887344 0.999670088
105 0.000002606 0.006864576 0.999620535
108 0.000002952 0.006843497 0.999568770
111 0.000003311 0.006823926 0.999515081
114 0.000003679 0.006805706 0.999459739
117 0.000004055 0.006788702 0.999402997
120 0.000004438 0.006772797 0.999345088
123 0.000004827 0.006757888 0.999286224
126 0.000005220 0.006743884 0.999226600
129 0.000005615 0.006730704 0.999166393
132 0.000006013 0.006718279 0.999105763
135 0.000006412 0.006706546 0.999044852
138 0.000006811 0.006695447 0.998983788
141 0.000007210 0.006684934 0.998922688
144 0.000007607 0.006674960 0.998861652
147 0.000008003 0.006665487 0.998800771
150 0.000008397 0.006656476 0.998740126
inf 0.000049195 0.006296924 0.992248062
The case of an infinite number of marbles in each box is equivalent to sampling with replacement, and it's still over 99% probable that the 20/10 observed box is the one with more red marbles.
Note the first two columns of probabilities are the combined probabilities of both events, so, for example in the case of 30 marbles, the probability of the occurrence given the majority of reds is in box 2 is 0.029472443, rather than the 1 that would be the probability that box 2 would result in 20 red and 10 white being observed, as 0.029472443 is the probability that the 10 red/20 white box would result in the 4 red/1 white observation, multiplied by 1.
DECLARE FUNCTION p30y20p10# (r#, w#)
DECLARE FUNCTION p5y4p1# (r#, w#)
DEFDBL A-Z
FOR n = 30 TO 150 STEP 3
n1r = n * 2 / 3: n1w = n / 3
n2r = n1w: n2w = n1r
p1 = p5y4p1(n1r, n1w) * p30y20p10(n2r, n2w)
p2 = p5y4p1(n2r, n2w) * p30y20p10(n1r, n1w)
PRINT USING "### #.######### #.######### #.#########"; n; p1; p2; p2 / (p1 + p2)
NEXT
p1 = 5 * (n1r / (n1w + n1r)) ^ 4 * n1w / (n1w + n1r) * 30045015# * (n2r / (n2w + n2r)) ^ 20 * (n2w / (n2w + n2r)) ^ 10
p2 = 5 * (n2r / (n2w + n2r)) ^ 4 * n2w / (n2w + n2r) * 30045015# * (n1r / (n1w + n1r)) ^ 20 * (n1w / (n1w + n1r)) ^ 10
PRINT
PRINT USING "& #.######### #.######### #.#########"; "inf"; p1; p2; p2 / (p1 + p2)
FUNCTION p30y20p10 (r, w)
n = r + w
p = 1
FOR i = 0 TO 19
p = p * (r - i) / n: n = n - 1
NEXT
FOR i = 0 TO 9
p = p * (w - i) / n: n = n - 1
NEXT
p30y20p10 = p * 30045015# ' comb(30,10)
END FUNCTION
FUNCTION p5y4p1 (r, w)
n = r + w
p = 1
FOR i = 0 TO 3
p = p * (r - i) / n: n = n - 1
NEXT
FOR i = 0 TO 0
p = p * (w - i) / n: n = n - 1
NEXT
p5y4p1 = p * 5 ' comb(5,1)
END FUNCTION
|