Lecture 3: Bayesian Statistics

Lecture 3: Bayesian Statistics
Julia Collins
9th October 2013
The lottery
What are the chances of winning the lottery jackpot? Suppose you’ve chosen your 6 numbers
and you’re watching the draw. There are 49 balls in the machine and 6 of them would be
6
a match if chosen. So we have a 49
chance of the first ball drawn matching. Now there are
48 balls in the machine, and 5 of them would be a match with the remaining numbers. So
4
5
chance of matching another number. There is a 47
chance of matching
this time there is 48
1
the next ball, with the pattern continuing until there is a 44 chance of the final 6th ball
matching.
All of these events have to happen in order for you to win the jackpot, so we multiply
the probabilities together to get the final jackpot probability:
P(Winning the jackpot) =
5
4
3
2
1
1
6
×
×
×
×
×
=
49 48 47 46 45 44
13, 983, 816
So the chance of winning the UK lottery jackpot is about 1 in 14 million. If you bought
a lottery ticket every other second for a year you would be expected to win only once. In
fact, if you buy the ticket earlier than Friday for the Saturday draw, then your chances of
dying before the draw are higher than your chance of winning!
The chance of getting the minimum prize of £10 for getting 3 balls correct is still a measly
1
. On average you would have to spend £57 to get a £10 return.
57
The Birthday Problem
How many people do you need there to be in a room before the chance of two people having
the same birthday is more than 50%? Most people will go for a figure of about 180, since there
are 365 days in a year and you need about half of that. The true figure is an astoundingly
low 23. How could this possibly be true?
1
The way to think about it is not how many individuals there are in the room, but how
many possible pairs of people there are. With 23 people there are 253 possible pairings,
which already makes the answer sound more believable. How do we calculate the probabilities
involved?
It turns out to be easier to calculate the chance that no two people share a birthday.
Let’s take it one person at a time. Suppose that one person is in a room and let’s ignore
leap years.
• A second person walks into the room. What is the chance that they do not share a
birthday with the first person? There are 364 other birthdays they could have out of
.
a possible 365, so the probability is simply 364
365
• A third person walks into the room. What is the chance that they do not share a
birthday with either of the other two? There are now 363 birthdays left to choose from
363
.
(assuming the first two are different) out of 365, so the probability is 365
• Continue similarly, so the fourth person has a
with any of the first three people.
362
365
chance of not sharing a birthday
As with the lottery, the total probability of no two people sharing a birthday is the
product of all these probabilities.
P(n people having no birthdays in common) =
364 363
365 − n + 1
×
× ··· ×
365 365
365
Finally, to get the probability that there do exist two people with a common birthday:
P(2 people having a common birthday) = 1 − P(no two people have a common birthday).
When we put in some different values for n we get this table:
Number of people Probability that two people share a birthday
10
11.7%
20
41.1%
23
50.7%
30
70.6%
50
97%
57
99%
100
99.99997%
200
99.9999999999999999999999999998%
366
100 %
So already with 50 people in a room you can be virtually certain that there will be some
pair of people with the same birthday.
2
The reason why this result seems so counterintuitive is that we very easily confuse the
probability that some two people share a birthday with the probability that someone shares
a birthday with a specific person. Of course, the probability that, in a room of 50 people,
someone shares a birthday with you, is fairly low.
Example of Bayes’ Theorem in action
We saw in the lecture that in 1828 a Frenchman called A. Taillandier found that 67% of
prisoners were illiterate. He claimed that “ignorance is the mother of all vices”. However, in
order to justify his statement, he would need to know not the proportion of criminals that
were illiterate, but the proportion of illiterate people who were criminals.
Bayes’ Theorem tells us how to work out a conditional probability given its inverse. His
formula is
P(B|A)P(A)
P(A|B) =
P(B)
where P(A|B) means “the probability of A happening, given that we know B”.
So, to work out the proportion of illiterate people are criminals, we need to work out
P(criminal|illiterate) =
P(illiterate|criminal)P(criminal)
.
P(illiterate)
From the research Taillandier did, we know that P(illiterate|criminal) = 67%. We can
roughly guess that the proportion of illiterate people in the French population at that time
was about 40%. Another stab in the dark, as it were, gives us an estimate of criminality at
about 5%. Putting these numbers into the equation gives us
P(criminal|illiterate) =
0.67 × 0.05
= 8.4%.
0.4
A figure of 8% is much less than the headline-grabbing figure of 67%! However, it is still a
little bit significant: only 5% of the general public are assumed to be criminals, but 8% of
illiterate people are criminals.
Homework
I have a friend who has two children. At least one of these children is a boy who was born
on a Tuesday. What is the chance that the other child is a boy?
3
Homework solution
First we have to write down all the combinations of two children (gender and day-of-week)
in which at least one of the children is a boy born on a Tuesday. We have:
• The older child is a boy born on a Tuesday and the younger child is a girl. The girl
could be born on any of the days of the week so there are 7 possibilities here.
• The younger child is a boy born on a Tuesday and the older child is a girl. The girl
could be born on any of the days of the week so there are 7 possibilities here.
• The older child is a boy born on a Tuesday and the younger child is a boy. The second
boy could also be born on any day of the week so there are 7 possibilities here.
• The younger child is a boy born on a Tuesday and the older child is a boy. The older
boy could be born on any day of the week EXCEPT Tuesday, because the case where
the older child is a boy born on a Tuesday was already counted. Therefore there are 6
possibilities here.
In total, then, there are 7 + 7 + 7 + 6 = 27 different gender/day-of-week combinations
where at least one child is a boy born on a Tuesday. Out of these 27, there are 7+6 = 13 cases
≈ 0.48.
where both children are boys. Therefore the answer to the homework question is 13
27
Interestingly, this answer is much closer to the intuitive value of 12 than the problem in
lectures where the day-of-the-week didn’t feature.
This Wikipedia article provides a good discussion of the difficulties associated with the
general boy/girl paradox: http://en.wikipedia.org/wiki/Boy_or_Girl_paradox
And here is a nice concise description of the different scenarios in which someone might tell
you they have two children, one of whom is a boy born on a Tuesday, and how the probabilities
associated to these scenarios change: http://blog.tanyakhovanova.com/?p=221.
4