Happy birthday to you! And you! And … you!

How often do you need to buy 3 cakes for the office party? Well, certainly 3 cakes is the minimum for a good party, but here I mean 3 different choices of cakes for 3 different birthday celebrants. Yes, this is another Riddler Classic write up :)

[Note: If you're on mobile, go landscape to see the equations properly.]


The puzzle

For what number of people do you have a greater than 50% chance that there is at least one trio who have the same birthday (not counting the year, of course). This is assuming that everyone's birthday is equally distributed throughout the year. (It's not.) And that February 29 doesn't exist. (It does.)

As they put it:

How many people do you need to have better-than-even odds that at least three of them have the same birthday? (Again, ignore leap years.)

Wissner-Gross, Z. https://fivethirtyeight.com/features/who-wants-to-be-a-riddler-millionaire/

Where does this even come from?

There's a reaonsably well known "paradox" called the birthday paradox which notes that essentially, you only need 23 people in a room (with the same assumptions as above) to have a better than 50% chance of a pair of people having the same birthday. Why is this a paradox? It's not really a paradox; it's just a surprising but true result.

That calculation is actually pretty easy. For $k$ people, you just see how many partial permutations of 365 days of length $k$ there are out of $365^k$. This gives you the probability of no birthday collisions, so the probability of a collision is the complement: \[ P(\text{At least one collision}) = 1 - \frac{k!}{(365-k)!365^k} \ .\] With $k=23$, we have $P = 50.7%$. (At $k=22$, it's 47.6%.)

Why is this harder?

For a single collision, the complement is simple: just invert no pairs. For a triple match, you have to complement no pairs, 1 pair, 2 pairs, … all possible pairs ($\text{floor}(k/2)$ pairs). Then, you're left with the probability of having at least one triple.

Let's calculate it!

Thus, we need the probability of having exactly $\ell$ pairs of collisions. That means $n\choose\ell$ possible sets of birthdays over $n$ days. There are $k\choose2$ ways to choose the first pair of people, ${k-2}\choose2$ ways for the second, ${k-4}\choose2$ for the third, and so on. Now all that's left is the $k-2\ell$ people that are not in pairs. For that, we have our old friend, the partial permutation. This gives \begin{multline}
{n\choose\ell} {k\choose2} {{k-2}\choose2} {{k-4}\choose2} \\
\times \cdots {{k-2(\ell-1)}\choose2} \frac{(n-\ell)!}{(n - \ell - (k-2\ell))!} \ .
\end{multline}
The binomial coefficients with $k$ can be greatly simplified:
\begin{align}
&{k\choose2} {{k-2}\choose2} {{k-4}\choose2} \cdots {{k-2(\ell-1)}\choose2} \\
=& \frac{k!}{2!(k-2)!} \frac{(k-2)!}{2!(k-4)!} \frac{(k-4)!}{2!(k-6)!} \\
&\times\cdots \frac{(k-2(\ell-2))!}{2!(k-2(\ell - 1))!} \frac{(k-2(\ell-1))!}{2!(k-2(\ell - 1) - 2)!} \\
=& \frac{k!}{2^\ell (k-2\ell)!}
\end{align}
Lastly we just need the $1/n^k$ factor for the probability. This all simplifies to just \begin{align}
\overline P &= {n\choose\ell} \frac{k!}{2^\ell(k-2\ell)!} \frac{(n-\ell)!}{(n - \ell - (k-2\ell))!} \frac1{n^k} \\
&= {n\choose\ell} \frac{k!}{2^\ell(k-2\ell)!} \frac{1}{(n - (k-2\ell))!} \frac1{n^k} \\
&= \frac{n!}{\ell!(n-\ell)!} \frac{k!}{2^\ell(k-2\ell)!} \frac{(n-\ell)!}{(n - (k - \ell))!} \frac1{n^k} \\
&= \frac{n!}{\ell!} \frac{k!}{2^\ell(k-2\ell)!} \frac{1}{(n - (k - \ell))!} \frac1{n^k}
\end{align}

This gives the probability of having at least one triple: \begin{multline} P(\text{At least 1 triple collision}) = \\ 1 - \sum\limits_{\ell=0}^{\text{floor}(k/2)} \frac{n!k!}{2^\ell n^k \ell! (n - (k-\ell))! (k - 2\ell)!} \ . \end{multline} With $n=365$, for $k=87$, we have 49.9%, and for $k=88$, we have 51.1%. So the tipping point for a single collision is 23, and for a triple, it's 88.

Let's compute it!

The real way to do this for a larger scale problem would be to apply an approximation. Either approach it from the beginning with a Poisson distribution, since each individual rate will be quite small, or at least apply Stirling's approximation. But for this specific problem we don't have any appreciable memory constraints, so let's calculate it for real!

First, let's define the probability of having exactly $\ell$ pairs of collisions over $n$ days with $k$ people. I.e., a single term in the sum above.

lpairsfromk(n, k, l) = ((factorial(n) * factorial(k)) /
    (2^l * n^k * factorial(l)
    * factorial(n - (k - l))
    * factorial(k - 2l)))

To use this function in sum, specifically the sum(f, itr) method, it'll be useful to define a method that unpacks a 3-tuple of arguments:

lpairsfromk(t::Tuple{R, S, T}) where {R, S, T} = lpairsfromk(t...)

This just takes a 3-tuple of any types, calls the unpacked arguments on the other lpairsfromk method. Finally, we can write the probability of having at least 1 trio of $k$ people over $n$ days.

p(n, k) = 1 - sum(lpairsfromk,
                  (big(n), big(k), big(l))
                  for l in 0:round(Int, k/2, RoundDown))
@btime findfirst(p -> p > 0.5, [p(365, k) for k in 1:100])
# Output:
#   17.678 ms (154551 allocations: 7.26 MiB)
# 88

Using the big function, we promote the integers to BigInts, and it's still pretty fast! This is how we get 88.

88 is the smallest group of people to have a greater than 50% chance of having 3 people with the same birthday.