Sociology 6Z03 Topic 10: Probability (Part I) Outline: Probability (Part I)

Sociology 6Z03
Topic 10: Probability (Part I)
John Fox
McMaster University
Fall 2014
John Fox (McMaster University)
Soc 6Z03: Probability I
Fall 2014
1 / 29
Fall 2014
2 / 29
Outline: Probability (Part I)
Introduction
Probability Basics
Random Variables
John Fox (McMaster University)
Soc 6Z03: Probability I
Introduction
Probability Theory
Probability theory is the area of mathematics that deals with random
phenomena.
Individual random events are intrinsically unpredictable, but repeated
random events are orderly and patterned.
It is the purpose of probability theory to describe these patterns —
literally to bring order to chaos.
Much of modern mathematics — for example, calculus, algebra, and
geometry — is of ancient origin, but probability theory did not exist
before the European Renaissance (specifically, the 17th century —
the late Renaissance or early Enlightenment).
One use of probability theory is to provide a foundation for statistical
inference.
Statistical inference is the process of drawing conclusions about
characteristics of a population based on a sample drawn at random
from the population.
John Fox (McMaster University)
Soc 6Z03: Probability I
Fall 2014
3 / 29
Probability Basics
Experiment, Outcomes, Sample Space, Realization
In probability theory:
an experiment is a repeatable procedure for making an observation;
an outcome is a possible observation resulting from an experiment; and
the sample space of the experiment is the set of all possible outcomes.
Any specific realization of the experiment produces a particular
outcome in the sample space.
John Fox (McMaster University)
Soc 6Z03: Probability I
Fall 2014
4 / 29
Probability Basics
Finite and Continuous Sample Spaces
Sample spaces may be discrete or continuous.
If, for example, we flip a coin twice and record on each flip whether
the coin shows heads (H) or tails (T ), then the sample space of the
experiment is discrete and finite, consisting of the outcomes
S = {HH, HT , TH, TT }
If, in contrast, we burn a light bulb until it fails, recording the failure
time in hours and fractions of an hour, then the sample space of the
experiment is continuous and consists of all positive real numbers
(not bothering to specify an upper limit for the life of a bulb):
S = {x : x > 0}.
John Fox (McMaster University)
Soc 6Z03: Probability I
Fall 2014
5 / 29
Probability Basics
Sample Spaces
Thought Question
Suppose that we flip a coin only once and observe whether the coin
comes up H or T . What is the sample space of this experiment?
A S = {HH, TT }.
B S = {HH, HT , TH, TT }.
C S = {H, T }.
D S = {HT }.
E I don’t know.
John Fox (McMaster University)
Soc 6Z03: Probability I
Fall 2014
6 / 29
Probability Basics
Events
An event is a subset of the sample space of an experiment — that is,
a set of outcomes.
An event is said to occur in a realization of the experiment if one of
its constituent outcomes occurs.
For example, for S = {HH, HT , TH, TT }, the event E = {HH, HT },
representing a head on the first flip of the coin, occurs if we obtain
either the outcome HH or the outcome HT .
John Fox (McMaster University)
Soc 6Z03: Probability I
Fall 2014
7 / 29
Probability Basics
Axioms of Probability
Probabilities are numbers assigned to events in a manner consistent
with the following axioms (rules, as given by Moore):
P1: The probability of an event E is a number between 0 and 1:
0 ≤ P (E ) ≤ 1.
P2: The sample space S is exhaustive — some outcome must
occur: P (S ) = 1.
P3: Two events A and B are disjoint if they have no outcomes in
common; disjoint events cannot occur simultaneously. The
probability of occurrence of one or the other of two disjoint
events is the sum of their separate probabilities of
occurrence: For A and B disjoint, P (A or
B ) = P (A) + P (B ).
P4: The probability that an event E does not occur is the
complement of its probability of occurrence:
P (not E ) = 1 − P (E ).
John Fox (McMaster University)
Soc 6Z03: Probability I
Fall 2014
8 / 29
Probability Basics
Interpretation of Probability
Probabilities can be interpreted as long-run proportions.
For example, to say that the probability of an event is .5 means that
the event will occur approximately half the time if the experiment is
repeated a very large number of times, with the approximation
tending to improve as the number of repetitions of the experiment
increases.
This interpretation provides a way to estimate probabilities: Repeat the
experiment many times and observe the proportion of times that the
event occurs.
This “objective” interpretation of probability is the basis of the
“classical” approach to statistical inference.
There are “subjective” or “personal” approaches to probability as well,
where a probability is interpreted as strength of belief that an event will
occur or that a proposition is true.
John Fox (McMaster University)
Soc 6Z03: Probability I
Fall 2014
9 / 29
Probability Basics
Axioms of Probability, Probability Models
The fourth axiom, P4, is not really needed: It can be deduced from
the others. (Can you see how?)
Consider the event E = {oa , ob , ..., om } where the oi ’s are outcomes
— that is elements of the sample space S.
Then, by the third axiom, the probability of E is the sum of
probabilities of its constituent outcomes,
P (E ) = P (oa ) + P (ob ) + · · · + P (om ).
Thus, if we know the probabilities of all of the outcomes in the sample
space, we can figure out the probability of any event.
A probability model for an experiment consists of the sample space
for the experiment and an assignment of probabilities to events in a
manner consistent with the axioms.
The axioms are not so restrictive as to imply a unique assignment of
probabilities to a sample space.
There are always infinitely many probability models for an experiment.
John Fox (McMaster University)
Soc 6Z03: Probability I
Fall 2014
10 / 29
Probability Basics
Probability Models: Examples
Suppose, for example, that all outcomes in the sample space
S = {HH, HT , TH, TT } are equally likely, so that
P (HH ) = P (HT ) = P (TH ) = P (TT ) = .25
This corresponds to a “fair coin flipped in a fair manner.”
Then, for E = {HH, HT } (“a head on the first flip”), the probability
of E is P (E ) = .25 + .25 = .5.
Let A = {TH, TT } be the event “a tail on the first flip,” and
B = {HH } the event “two heads.”
The events A and B are disjoint, and the event A or B is
{TH, TT , HH }; thus,
P (A or B ) = .75 = P (A) + P (B ) = .5 + .25
John Fox (McMaster University)
Soc 6Z03: Probability I
Fall 2014
11 / 29
Probability Basics
Probability Models: Examples
Thought Question
Continuing with the preceding example, with
S = {HH, HT , TH, TT } and equally likely outcomes, as before let
A = {TH, TT } be the event “a tail on the first flip.” Now let C be
the event “a tail on the second flip.” What outcomes are in C ?
A C = {TH, TT }.
B C = {HT , TT }.
C C = {TT }.
D C = {HH, TH }.
E I don’t know.
John Fox (McMaster University)
Soc 6Z03: Probability I
Fall 2014
12 / 29
Probability Basics
Probability Models: Examples
Thought Question
Are the events A = {TH, TT } (“a tail on the first flip”) and
C = {HT , TT } (“a tail on the second flip”) disjoint?
A Yes.
B No.
C I don’t know.
John Fox (McMaster University)
Soc 6Z03: Probability I
Fall 2014
13 / 29
Probability Basics
Probability Models: Examples
Equally likely outcomes produce a simple example but any assignment
of probabilities to outcomes that sums to 1 is consistent with the
axioms.
For example, a coin that is weighted to produce 2/3 heads yields the
following probabilities for two independent flips:
P (HH ) = 4/9, P (HT ) = P (TH ) = 2/9, P (TT ) = 1/9
John Fox (McMaster University)
Soc 6Z03: Probability I
Fall 2014
14 / 29
Probability Basics
Probability Models: Examples
Thought Question
With this assignment of probabilities to outcomes,
P (HH ) = 4/9, P (HT ) = P (TH ) = 2/9, P (TT ) = 1/9
what is the probability of the event E = {HH, HT }?
A 6/9 = 2/3.
B 4/9.
C 2/4 = 1/2.
D 8/9.
E I don’t know.
John Fox (McMaster University)
Soc 6Z03: Probability I
Fall 2014
15 / 29
Random Variables
A random variable is a rule that assigns a number to each outcome in
the sample space of an experiment.
For example, the following random variable X counts the number of
heads in each outcome of the coin-flipping experiment:
outcome
HH
HT
TH
TT
value x of X
2
1
1
0
It is sometimes useful to distinguish between the random variable X
(denoted by an upper-case letter) and a particular value of the random
variable x (denoted by a lower-case letter).
John Fox (McMaster University)
Soc 6Z03: Probability I
Fall 2014
16 / 29
Random Variables
Probability Distributions
The probability distribution of a discrete random variable lists all
possible values xi of the variable and shows the probability pi of
observing each.
For example, for the coin-flipping experiment with equally likely
outcomes, the probability distribution of the number of heads X is
TT =⇒
HT, TH =⇒
HH =⇒
xi
0
1
2
sum
p i = P ( X = xi )
.25
.50
.25
1.00
Notice that although the outcomes in the original sample space of the
experiment are equally likely, the values of the random variable X are
not equally likely.
John Fox (McMaster University)
Soc 6Z03: Probability I
Fall 2014
17 / 29
Random Variables
Probability Distributions
In general, a discrete, finite random variable has a number of possible
values, x1 , x2 , ..., xk , with probabilities p1 , p2 , ..., pk .
Following from the axioms of probability theory, each probability pi is
a number between 0 and 1, and the sum of all probabilities is
p1 + p2 + · · · + pk = 1.
We can find the probability of particular events that refer to the
random variable by summing the probabilities for the values that
make up the event.
For example, the probability of getting at least one head is
P (X ≥ 1) = P (X = 1) + P (X = 2) = .50 + .25 = .75.
John Fox (McMaster University)
Soc 6Z03: Probability I
Fall 2014
18 / 29
Random Variables
Probability Distributions
The probability distribution of a discrete random variable can be
graphed as follows:
0.0
0.1
0.2
0.3
0.4
0.5
Probability, pi
1
0
2
Number of Heads, xi
John Fox (McMaster University)
Soc 6Z03: Probability I
Fall 2014
19 / 29
Random Variables
Probability Distributions
Thought Question
Continuing with the sample space S = {HH, HT , TH, TT }, define
the random variable Y = 1 when both flips are the same (i.e., HH or
TT ) and Y = 0 when they are different (i.e., HT or TH). If the four
outcomes are equally likely, each with probability 1/4 = 0.25, is the
following probability distribution for Y correct?
yi
0
1
sum
pi = P (Y = yi )
0.5
0.5
1.0
A Yes.
B No.
C I don’t know.
John Fox (McMaster University)
Soc 6Z03: Probability I
Fall 2014
20 / 29
Random Variables
Mean of a Random Variable
The mean µ of the random X , also called the expectation or expected
value of X , is defined in the following manner:
µ = x1 p1 + x2 p2 + · · · + xk pk
= ∑ xi pi
It is conventional to use Greek letters like µ to represent numerical
summaries of probability distributions.
The expected value of X is also written as E (X ).
For our example:
xi
pi
xi pi
0
.25
0.00
1
.50
0.50
2
.25
0.50
sum 1.00 µ = 1.00
John Fox (McMaster University)
Soc 6Z03: Probability I
Fall 2014
21 / 29
Random Variables
Mean of a Random Variable
The mean of X gives the average value of the random variable in the
following senses:
The mean µ is the average of the possible values of X , each weighted
by its probability of occurrence.
If you think of probabilities as weights arranged along a bar, the mean
µ is the point at which the bar balances:
.50
.25
0.0
0.5
1.0
µ
.25
1.5
2.0
Number of Heads, xi
If we repeat the experiment many times and calculate the value of X
for each realization, then the average of these values of X is
approximately µ, with the approximation tending to get better as the
number of repetitions increases.
John Fox (McMaster University)
Soc 6Z03: Probability I
Fall 2014
22 / 29
Random Variables
Mean of a Random Variable
Thought Question
Recall the random variable Y = 1 when both flips in the coin-flipping
experiment are the same and Y = 0 when they are different. With
equally likely outcomes, Y has the probability distribution
pi = P (Y = yi )
0.5
0.5
1.0
yi
0
1
sum
What is the mean of Y ?
A 0.
B 1.
C 0.5.
John Fox (McMaster University)
Soc 6Z03: Probability I
Fall 2014
23 / 29
Random Variables
Variance and Standard Deviation of a Random Variable
The variance σ2 of X measures how spread out the distribution of X
is around its mean µ:
σ2 = (x1 − µ)2 p1 + (x2 − µ)2 p2 + · · · + (xk − µ)2 pk
= ∑(xi − µ)2 pi
The variance of X is also written as V (X ) or Var(X ).
The standard deviation σ of X is just the square root of the variance
(and restores the units of the variable).
Continuing the example (where µ = 1),
x
0
1
2
sum
pi
.25
.50
.25
1.00
John Fox (McMaster University)
xi − µ
−1
0
1
(xi − µ)2 pi
0.25
0.00
0.25
2
σ = 0.50
Soc 6Z03: Probability I
σ=
√
0.50 = 0.707 heads
Fall 2014
24 / 29
Random Variables
Variance and Standard Deviation of a Random Variable
Thought Question
Recall the random variable Y with probability distribution
pi = P (Y = yi )
0.5
0.5
1.0
yi
0
1
sum
and mean µ = 0.5.
What are the variance and standard deviation of Y ?
A σ2 = 1 and σ = 1.
B σ2 = 0.25 and σ = 0.5.
C σ2 = 0.5 and σ = 0.25.
D I don’t know.
John Fox (McMaster University)
Soc 6Z03: Probability I
Fall 2014
25 / 29
Random Variables
Mean, Variance and Standard Deviation
The formulas for µ, σ2 , and σ are very similar to the formulas for the
mean x, variance s 2 , and standard deviation s of a variable in a data
set:
random variable
mean
µ = ∑ x i pi
variable in a data set
x = n1 ∑ xi
= ∑ xi n1
s2 =
variance
σ2 = ∑(xi − µ)2 pi
standard deviation
σ=
John Fox (McMaster University)
√
σ2
Soc 6Z03: Probability I
1
n −1
∑(xi − x )2
= ∑(xi − x )2 n−1 1
s=
√
s2
Fall 2014
26 / 29
Random Variables
Continuous Random Variables
Random variables defined on continuous sample spaces may
themselves be continuous.
The probability distribution of a continuous random variable X is
described by a density curve, p (x ).
It is meaningless to talk of the probability of observing specific,
individual values of a continuous random variable, but areas under the
density curve give the probability of observing specific ranges of
values of the random variable.
John Fox (McMaster University)
Soc 6Z03: Probability I
Fall 2014
27 / 29
Random Variables
Continuous Random Variables
A continuous random variable, like a discrete random variable, has a
mean, variance, and standard deviation.
The formulas for the mean and variance of a continuous random
variable are very similar to the corresponding formulas for a discrete
random variable (substituting integrals for sums):
µ=
σ2 =
Z
xp (x )dx
Zall x
(x − µ)2 p (x )dx
all x
R
(If you are unfamiliar with calculus, integrals are the continuous
analogs of sums — but don’t worry about these formulas.)
John Fox (McMaster University)
Soc 6Z03: Probability I
Fall 2014
28 / 29
Random Variables
Continuous Random Variables
Any density curve gives the distribution of a continuous random
variable.
A particularly important family of random variables is the family of
normal distributions, which is already familiar.
p(z)
0.3
0.4
Recall that a normal distribution is uniquely specified by its mean µ
and standard deviation σ.
An example, for the standard normal distribution, Z ∼ N (0, 1):
0.0
0.1
0.2
P(0 < Z < 2) = .4772
−4
John Fox (McMaster University)
−2
0
Soc 6Z03: Probability I
2
Z
4
Fall 2014
29 / 29