Case of the Missing Birth Month
This week's Riddler Express was very simple.
What was the probability that none of the 40 people had birthdays [in March]? (For the purpose of this riddle, assume that a year consists of 12 equally long months. It’s a sufficiently good approximation!)
Solution
So there are $40$ people, and we want to know the probability that none of them have a birthday in March. If the chance of being born in each of the months of the year is uniformly distributed, then the chance of one person not being born in March is simply \[ \text{Pr}(\text{Person not born in March}) = \frac{11}{12} \ . \] For forty people, since their birth months are all independent, it's \[ \text{Pr}(\text{Forty people not born in March}) = \left(\frac{11}{12}\right)^{40} \approx 3.079\% \ .\]
Thus, the probability is $3.079\%$.
Verification
We can verify by simulations. We can generate uniformly random birth months for $40$ people in julia as rand(1:12, 40)
. That would be one sample of birth months for everyone. We check if March is excluded from the sample with 3 ∉ rand(1:12, 40)
. This function counts the number of successes for n
simulations:
single_month(n) = sum(3 ∉ rand(1:12, 40) ? 1 : 0 for _ in 1:n)
Obviously the choice of $3$ makes no difference. We can run this for different choices of $n$ and calculate a posterior distribution in the probability of success parameter using the sim
function:
using Distributions
using Printf
function sim(f, n)
w = f(n)
l = n - w
d = Beta(w+1, l+1)
@printf " %0.17f\n± %0.17f" mean(d) sqrt(var(d))
end
We'll derive the posterior mean and standard deviation for different values of $n$:
julia> sim(single_month, 10_000)
0.03139372125574885
± 0.00174353192717317
julia> sim(single_month, 100_000)
0.03108937821243575
± 0.00054883444787154
julia> sim(single_month, 1_000_000)
0.03072593854812290
± 0.00017257394329145
julia> sim(single_month, 10_000_000)
0.03078269384346123
± 0.00005462152565854
These agree nicely with our answer of $3.079\%$.
Extra credit
An extra credit problem was added after I originally wrote this. This time, instead of having none of $40$ people with a birthday in March, what if we want the probability that there is any month that is no one's birthday? Using an alternate form of the notin
function, ∉(v)
, we are not given a Boolean, but another function which tests whether its argument is not in v
. E.g. if we define f = ∉([1, 3, 4])
. Then, f(1)
is true
and f(2)
is false
. Then, we can see if there's any month missing from a random sample of $40$ birthdays with the expression any(∉(rand(1:12, 40)), 1:12)
. Thus we can define a function similar to single_month
but now for any month as follows:
any_month(n) = sum(any(∉(rand(1:12, 40)), 1:12) ? 1 : 0 for _ in 1:n)
Now we can estimate the probability as before.
julia> sim(any_month, 10_000)
0.32773445310937810
± 0.00469317061061426
julia> sim(any_month, 100_000)
0.32668346633067341
± 0.00148308725472723
julia> sim(any_month, 1_000_000)
0.32682234635530727
± 0.00046905099962754
julia> sim(any_month, 10_000_000)
0.32688503462299306
± 0.00014833446759037
julia> sim(any_month, 100_000_000)
0.32677092346458153
± 0.00004690327072209
Thus we estimate this to be the much larger probability of $32.7\%$.