Case of the Missing Birth Month

Riddler Express solution for the problem of an unrepresented birth month for a group of officemates.

Several floating colored balloons.
Photo by Adi Goldstein / Unsplash

This week's Riddler Express was very simple.

What was the probability that none of the 40 people had birthdays [in March]? (For the purpose of this riddle, assume that a year consists of 12 equally long months. It’s a sufficiently good approximation!)

Solution

So there are $40$ people, and we want to know the probability that none of them have a birthday in March. If the chance of being born in each of the months of the year is uniformly distributed, then the chance of one person not being born in March is simply \[ \text{Pr}(\text{Person not born in March}) = \frac{11}{12} \ . \] For forty people, since their birth months are all independent, it's \[ \text{Pr}(\text{Forty people not born in March}) = \left(\frac{11}{12}\right)^{40} \approx 3.079\% \ .\]

Thus, the probability is $3.079\%$.

Verification

We can verify by simulations. We can generate uniformly random birth months for $40$ people in julia as rand(1:12, 40). That would be one sample of birth months for everyone. We check if March is excluded from the sample with 3 ∉ rand(1:12, 40). This function counts the number of successes for n simulations:

single_month(n) = sum(3 ∉ rand(1:12, 40) ? 1 : 0 for _ in 1:n)

Obviously the choice of $3$ makes no difference. We can run this for different choices of $n$ and calculate a posterior distribution in the probability of success parameter using the sim function:

using Distributions
using Printf

function sim(f, n)
    w = f(n)
    l = n - w
    d = Beta(w+1, l+1)
    @printf "  %0.17f\n± %0.17f" mean(d) sqrt(var(d))
end

We'll derive the posterior mean and standard deviation for different values of $n$:

julia> sim(single_month, 10_000)
  0.03139372125574885
± 0.00174353192717317

julia> sim(single_month, 100_000)
  0.03108937821243575
± 0.00054883444787154

julia> sim(single_month, 1_000_000)
  0.03072593854812290
± 0.00017257394329145

julia> sim(single_month, 10_000_000)
  0.03078269384346123
± 0.00005462152565854

These agree nicely with our answer of $3.079\%$.

Extra credit

An extra credit problem was added after I originally wrote this. This time, instead of having none of $40$ people with a birthday in March, what if we want the probability that there is any month that is no one's birthday? Using an alternate form of the notin function, ∉(v), we are not given a Boolean, but another function which tests whether its argument is not in v. E.g. if we define f = ∉([1, 3, 4]). Then, f(1) is true and f(2) is false. Then, we can see if there's any month missing from a random sample of $40$ birthdays with the expression any(∉(rand(1:12, 40)), 1:12). Thus we can define a function similar to single_month but now for any month as follows:

any_month(n) = sum(any(∉(rand(1:12, 40)), 1:12) ? 1 : 0 for _ in 1:n)

Now we can estimate the probability as before.

julia> sim(any_month, 10_000)
  0.32773445310937810
± 0.00469317061061426

julia> sim(any_month, 100_000)
  0.32668346633067341
± 0.00148308725472723

julia> sim(any_month, 1_000_000)
  0.32682234635530727
± 0.00046905099962754

julia> sim(any_month, 10_000_000)
  0.32688503462299306
± 0.00014833446759037

julia> sim(any_month, 100_000_000)
  0.32677092346458153
± 0.00004690327072209

Thus we estimate this to be the much larger probability of $32.7\%$.