Guess the Jellies in the Urn

This week's Riddler Classic presented a problem about sampling without replacement.

Guess the Jellies in the Urn
Photo by Andreas Fickl / Unsplash

This week's Riddler Classic presented a problem about sampling without replacement.

You have an urn with an equal number of red balls and white balls, but you have no information about what that number might be. You draw 19 balls at random, without replacement, and you get eight red balls and 11 white balls. What is your best guess for the original number of balls (red and white) in the urn?

Sampling without replacement

This rarely gets much treatment in introductory texts[1]. Most often, you are presented with very large or infinite sample spaces so that sampling without replacement can be treated as sampling with replacement, which is much simpler mathematically. 'Without replacement' means that each time you draw a ball, it is not returned (replaced) in the urn; each drawing is made from a differently configured urn.

Consider an urn with $N$ balls, $R$ of which are red and $N-R$ are white. Let $R_i$ be the event of drawing a red ball on the $i^{\text{th}}$ drawing, and $W_i$ be drawing a white ball. Knowing this, it is clear that $P(R_1) = R/N$ and that $P(W_i) = P(\overline{R_i})$. If the first ball drawn is red, there are now $R-1$ red balls left, and $N-1$ total balls, so the probability of drawing two reds in a row is \[
P(R_1R_2) = P(R_1) P(R_2|R_1) = \frac{R}{N} \frac{R-1}{N-1} ~.
\]
And so the probability of drawing $r$ red balls consecutively from the urn is \begin{align}
P(R_1 R_2 \cdots R_r) &= \frac{R(R-1)\cdots(R-(r-1))}{N(N-1)\cdots(N-(r-1))} \\
&= \frac{R!(N-r)!}{(R-r)!N!}
\end{align}
After drawing $r$ balls in a row, the probability of drawing $w$ balls in a row is the same except that now $R \to N-R$, $N\to N-r$, and $r\to w$, as there are still $N-R$ balls in the urn but the total is $N-r$. This gives \[
P(W_{r+1}\cdots W_{r+w}|R_1\cdots R_r) = \frac{(N-R)!(N-r-w)!}{(N-R-w)!(N-r)!} ~.
\]
Now, since $P(AB) = P(A)P(A|B)$, we have the probability of drawing $r$ balls followed by $w$ balls from the urn (letting $n=r+w$, the total number drawn) \begin{align}
P(R_1\cdots R_r W_{r+1}\cdots W_n) &= \frac{R!(N-r)!}{(R-r)!N!} \frac{(N-R)!(N-r-w)!}{(N-R-w)!(N-r)!} \\
&= \frac{R!(N-r)!(N-R)!(N-n)!}{(R-r)!N!(N-R-w)!(N-r)!} \\
&= \frac{R!(N-R)!(N-n)!}{(R-r)!N!(N-R-w)!} ~.
\end{align}
Actually, this is not very interesting — we're rather more interested in not $r$ reds in a row followed by $w$ whites, but $r$ reds and $w$ whites out of $n$ draws, in any order. Incredibly, every permutation of $(\overbrace{R,\dots,R}^{r},\overbrace{W,\dots,W}^{w})$ draws has exactly the same probability. This is because each of the multiplicative factors of the numerator and denominator will arise in some order so long as $r$ reds and $w$ whites are drawn. Then, we only need to count the permutations, which of course is given by the binomial coefficient $n\choose r$. Then, the probability of exactly $r$ red balls given $n$ draws from an urn of $R$ red balls out of $N$ total balls is \begin{align}
P(r|N,R,n) &= {n\choose r} \frac{R!(N-R)!(N-n)!}{(R-r)!N!(N-R-w)!} \\
&= \frac{n!}{r!(n-r)!} \frac{R!(N-R)!(N-n)!}{(R-r)!N!(N-R-w)!} \\
&= \frac{R!}{r!(R-r)!} \frac{(N-R)!}{(n-r)!(N-R - (n - r))!} \frac{n! (N-n)!}{N!} \\
&= {N \choose n}^{-1} {R\choose r} {N-R \choose n-r} ~.
\end{align}
This is the hypergeometric distribution.

Inference

We can calculate the probability of an outcome given knowledge of the urn, but in this problem, we want to use the knowledge of the outcome to infer properties of the urn. Think of $D=(n, r)$ as the data; we know that from $n=19$ draws, $r=8$ turned out to be red. Then, the hypergeometric distribution as written above is $P(D|N,R)$, the likelihood of the data given an with $R$ red of $N$ balls.

In general, given this data, our distribution in N and R would be \[
P(N,R|D) = P(N) P(R|N) \frac{P(D|N,R)}{P(D)}
\] with \[
P(D) = \sum_{N'=0}^\infty \sum_{R'=0}^{N'} P(N')P(R'|N')P(D|N',R') ~.
\]
To obtain a distribution only in $N$, we would marginalize $R$ \[
P(N|D) = \sum_{R'=0}^N P(N,R|D) = \frac{P(N) \sum_{R'=0}^N P(R'|N) P(D|N,R')}{P(D)} ~.
\]

But we have this crucial bit of information from the problem:

… an urn with an equal number of red balls and white balls

This means that $R$ is exactly $\frac12 N$. In terms of distributions, we have the Kronecker $P(R|N) = \delta_{2R,N}$. So \begin{align}
P(N|D) &= \frac{P(N) \sum_{R'=0}^N \delta_{2R',N} P(D|N,R')}{P(D)} \\
&= \frac{P(N) P(D|N,R'=\frac12N)}{P(D)}
\end{align}
and \begin{align}
P(D) &= \sum_{N'=0}^\infty \sum_{R'=0}^{N'} P(N')P(R'|N')P(D|N',R') \\
&= \sum_{N'=0}^\infty \sum_{R'=0}^{N'} P(N')\delta_{2R',N'}P(D|N',R') \\
&= \sum_{N'=0}^\infty P(N')P(D|N',R'=\frac12N)
\end{align}

Evidence

The factor $P(D)$, called the evidence, which we simplified above actually won't affect our result at all. We need it if we wanted to calculate a probability, as it ensures the distribution is normalized, but if we are merely selecting a choice of $N$, then it does not affect the outcome — the probability for each choice of $N$ will be scaled by the same constant number $P(D)$. So we will not calculate it here.

Priors

If you were only told that there is an urn with some number of red and white balls, what values for the number of balls would you think are more likely? Any positive integer might have some probability of being the correct. Maybe 0 and 1 could be excluded, depending on your interpretation. This is what must be provided in $P(N)$. Jaynes liked writing $P(N|I)$, where $I$ represents your prior information. For instance when we wrote $P(R|N) = \delta_{2R,N}$, you could conceivably have written that as actually $P(R|N,I)$ — it's not just given a value of $N$ but also the information that $R=\frac12$ from the wording of the problem.

In our case, we don't have much to go on at all. Of course, once you know that 19 balls have been drawn, clearly $N\ge19$. But further, 11 are white, so clearly $N\ge22$ since we know that there must be at least 11 corresponding reds. Further, we know that $N$ must be even; odd values should have zero probability. I do feel that the larger values of $N$ are less likely, so a traditional choice of $1/N$ might be plausible (modulo evenness).

Without considering measures, you might just pick a large number, say 1 million and not believe there could be more balls in the urn than that, so that \[
P(N) = \begin{cases} A & 0 \le N \le N_{\text{max}}, N \text{ even} \\
0 & \text{otherwise} \end{cases}
\] where $A = 1/(N_\text{max} + 1)$ normalizes the distribution. This is quite simple, and has the nice property of being irrelevant for finding a point estimate for $N$. Additionally, in this scenario, the maximum a posteriori (MAP) point estimate is the same as the maximum likelihood estimate (MLE).

Solution

If we consider $P(N)$ to be constant within some bounds for $N$ and ignore also the certainly constant $P(D)$, we simply have that \[
P(N|D) \propto P(D|N) = {N \choose 19}^{-1} {N/2\choose 8} {N/2 \choose 11} \]
Using the maximum a posteriori (MAP) estimate, 34 balls originally in the urn is my best guess:

using Distributions

f(R) = pdf(Hypergeometric(R, R, 19), 11)
findmax(f.(11:10000))

# Output: (0.16210443165514007, 7)

In the above, I just used $R$ instead of $N$ to not bother with evenness. Checking values of $R$ from 11 to 10,000, the highest value of my posterior is the seventh index, corresponding to $R=17 \Longrightarrow N=34$.

Simulation

As a simple check, we could just simulate 19 drawings from various urns and see which urn gives 8 reds / 11 whites most frequently. Sampling from Hypergeometric(R, R, 19) would give us how many successes (either reds or whites) are in our drawing. We can generate $n$ samples by rand(Hypergeometric(R, R, 19), n). Then, we simply have to count how often our samples returned 8 or 11. That is, we could pick 8 or 11, since it will be symmetric for sums to 19. We can do that with

sim(n, R) = count(==(8), rand(Hypergeometric(R, R, 19), n))

@btime sim(10_000, 20)
#   1.156 ms (2 allocations: 78.20 KiB)
# 1659

@btime sim(100_000, 20)
#   11.693 ms (2 allocations: 781.33 KiB)
# 15897

@btime sim(1_000_000, 20)
#   118.120 ms (2 allocations: 7.63 MiB)
# 161886

Since I have enough patience for about a few minutes, and I want to scan up to say, 1,000. I think 1,000,000 trials each will be fine.

findmax(sim1m.(11:1_000))
# (161992, 7)

Again, this is the seventh index in 11:1_000, which is 17, implying $N=34$. It's no surprise that we got this since, our point estimate was both the MAP and MLE, and this is essentially obtaining the MLE by simulation.

References

  1. E. T. Jaynes. Probability Theory. 1st ed. New York: Cambridge University Press, 2003. ISBN: 978-0-521-59271-0.

  1. This discussion follows Jaynes, 2003. ↩︎