# Case of the Missing Birth Month

Riddler Express solution for the problem of an unrepresented birth month for a group of officemates.

This week's Riddler Express was very simple.

What was the probability that none of the 40 people had birthdays [in March]? (For the purpose of this riddle, assume that a year consists of 12 equally long months. It’s a sufficiently good approximation!)

## Solution

So there are $40$ people, and we want to know the probability that *none of them* have a birthday in March. If the chance of being born in each of the months of the year is uniformly distributed, then the chance of *one person* not being born in March is simply \[ \text{Pr}(\text{Person not born in March}) = \frac{11}{12} \ . \] For forty people, since their birth months are all independent, it's \[ \text{Pr}(\text{Forty people not born in March}) = \left(\frac{11}{12}\right)^{40} \approx 3.079\% \ .\]

Thus, the probability is $3.079\%$.

## Verification

We can verify by simulations. We can generate uniformly random birth months for $40$ people in julia as `rand(1:12, 40)`

. That would be one sample of birth months for everyone. We check if March is excluded from the sample with `3 ∉ rand(1:12, 40)`

. This function counts the number of successes for `n`

simulations:

```
single_month(n) = sum(3 ∉ rand(1:12, 40) ? 1 : 0 for _ in 1:n)
```

Obviously the choice of $3$ makes no difference. We can run this for different choices of $n$ and calculate a posterior distribution in the probability of success parameter using the `sim`

function:

```
using Distributions
using Printf
function sim(f, n)
w = f(n)
l = n - w
d = Beta(w+1, l+1)
@printf " %0.17f\n± %0.17f" mean(d) sqrt(var(d))
end
```

We'll derive the posterior mean and standard deviation for different values of $n$:

```
julia> sim(single_month, 10_000)
0.03139372125574885
± 0.00174353192717317
julia> sim(single_month, 100_000)
0.03108937821243575
± 0.00054883444787154
julia> sim(single_month, 1_000_000)
0.03072593854812290
± 0.00017257394329145
julia> sim(single_month, 10_000_000)
0.03078269384346123
± 0.00005462152565854
```

These agree nicely with our answer of $3.079\%$.

## Extra credit

An extra credit problem was added after I originally wrote this. This time, instead of having none of $40$ people with a birthday in *March*, what if we want the probability that there is *any* month that is no one's birthday? Using an alternate form of the `notin`

function, `∉(v)`

, we are not given a Boolean, but another function which tests whether *its* argument is not in `v`

. E.g. if we define `f = ∉([1, 3, 4])`

. Then, `f(1)`

is `true`

and `f(2)`

is `false`

. Then, we can see if there's *any* month missing from a random sample of $40$ birthdays with the expression `any(∉(rand(1:12, 40)), 1:12)`

. Thus we can define a function similar to `single_month`

but now for any month as follows:

```
any_month(n) = sum(any(∉(rand(1:12, 40)), 1:12) ? 1 : 0 for _ in 1:n)
```

Now we can estimate the probability as before.

```
julia> sim(any_month, 10_000)
0.32773445310937810
± 0.00469317061061426
julia> sim(any_month, 100_000)
0.32668346633067341
± 0.00148308725472723
julia> sim(any_month, 1_000_000)
0.32682234635530727
± 0.00046905099962754
julia> sim(any_month, 10_000_000)
0.32688503462299306
± 0.00014833446759037
julia> sim(any_month, 100_000_000)
0.32677092346458153
± 0.00004690327072209
```

Thus we estimate this to be the much larger probability of $32.7\%$.