Contents
II
Probability
1
1
Introduction
1
2
Random Experiment
3
2.1
3
Set Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Part II
Probability
1
Introduction
Randomness
Almost everything we observe is random. That does not mean we do not understand things very well. We
have two types of knowledge about randomness:
• We know all the possible outcomes (i.e. we know the sample space).
– We knew that either Obama or McCain would win on November 4.
– We know a quarter, when flipped, will either come up tails or heads.
• We can guess the probability that a particular outcome will occur.
– People were prognosticating the elections throughout last year and quantifying the probability of
an Obama win in may ways (probability, odds, or even price at the Iowa Electronic Markets).
– Most of us would simply agree that a fair coin has equal probability of landing heads or tails.
Calculating Probabilities
Fair coins aside, how do we come up with numeric probabilities? It can be a tough process. In this class, we
will discuss situations where the probabilities can be computed exactly (in other words, we deal with very
controlled situations). In general, there are three ways to get at probabilities:
• Knowledge and Common Sense. A lot of probabilities can be computed simply by knowing. For
example, a team of physicist and biologists could probably give you some pretty convincing evidence
that there is a 50:50 probability of getting a head or a tail upon flipping a coin, but you already had that
common sense. You could also probably tell me my chances of drawing a red diamond from a deck
of cards without too much trouble. For the latter calculation, you are using a counting method that we
will discuss later.
• Experience and Experimentation. If you study something for a long time, you begin to see patterns.
You can estimate the future probability of a particular outcome by computing the proportion of time
the same outcome occurred in the past. For example, if you are anxious about taking tests, perhaps it is
because you’ve had a few bad experiences in the past, in other words you fear the probability a future
failure may be high.
• Theoretical Model. By making assumptions and rules, you can invent a theoretical model to emulate reality. The advantage is that you can compute probabilities of any outcome exactly. For example, we can develop a model of a repeatedly flipped fair coin that lands heads exactly 50% of the
time, otherwise tails. Then, we flip the coin 6 times and compute the probability of any outcome
{T T T T T T, T T T T T H, T T T T HH, . . .}. (You’ll do the calculations later.)
Estimating Probabilities from Long-Run Proportions
As an example of method two, we will use R to estimate the probability of a fair coin turning up heads. R
will help us by repeatedly running the coin tossing experiment so we don’t have to. In this case, we know
that the true probability is 0.5 (it is a fair coin), but let’s suppose we don’t know that. Instead, we flip a fair
coin multiple times and compute the proportion of times the coin turns up 1. That will be our guess of the
probability of flipping heads on the next future flip.
#
>
#
>
create a coin
coin <- c(0,1) # 0 indicates tails, 1 indicates heads
flip the coin (run the experiment) 100 times
sample(x=coin, size=100, replace=T)
[1] 1 1 0 1 0 0 1 0 1 0 1 0 1 0 0 1 1 1 0 1 1 1 0 1 0 1 0 0 0 0 0 0 1 1 0 1 0
[38] 0 0 0 1 0 0 0 1 1 0 1 1 1 1 1 0 1 1 0 0 1 1 1 1 0 0 0 0 0 0 1 0 1 1 0 1 1
[75] 1 1 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 1
# estimate the proportion of times a head turned up
> sum(sample(x=coin, size=100, replace=T))/100
[1] 0.52
# create an empty vector to hold some results
> p <- NULL
# repeat the estimation 10000 times
> for(i in 1:10000) {
+ p[i] <- sum(sample(x=coin, size=100, replace=T))/100
+ }
# plot a histogram of the results
> hist(p)
Histogram of p
Frequency
0
0
500
500
Frequency
1000
1000
1500
1500
Histogram of p
0.3
0.4
0.5
0.6
0.7
0.3
p
0.4
0.5
0.6
0.7
p
b
a
Plot a is the 10, 000 estimates of the probability obtained from flippy the coin 100 times. We see that a good
fraction of our estimates are below 0.40 and above 0.6. Since the truth is 0.5, our guess at the probability of
heads is quite lousy! But, if we run the experiment (coin flipping) 10, 000 times instead of just 100, we get
much closer to the truth, as shown by the tightness in histogram b.
2
Thus, we learn that the more information we collect, the more experiments we run, the better we get at
predicting a random outcome.
But I bet you already knew that!
R: sample()
A note about R’s sample(). It randomly selects one element in its argument x, which is a vector provided
by the user. It does this random selection size times. If replace=T, then it restores the vector before each
sampling event. If replace=F, then once an element is sampled, it is removed from the vector and will not
be sampled again.
For example, I provide vector (4, 3, 3, 1). Suppose sample() selects 1 first. Without replacement, the
vector then becomes (4, 3, 3). Next, sample() selects 3, and the vector becomes (4, 3). Next sample()
selects 3. Last, sample() selects 4. If size is smaller than the length of the vector, sample() will
stop making random selections after size times. When sample() is finished, it returns a vector of its
selections: (1, 3, 3, 4) in our example.
2
Random Experiment
Definition: Random Experiment
A random experiment is the chain of circumstances leading up to an outcome.
In some cases, the circumstances are tightly controlled by an experimenter and can be repeated many times
(e.g. flipping coin, growing corn in a greenhouse, making a compound in chem lab). In other cases, the
circumstances are largely out of our control and cannot be repeated (e.g. a plane landing in the Hudson, the
election of Obama).
Event
Definition: Event
An event is a collection of outcomes.
We’ll see this more formally later, but for example, the event “Obama wins” contains many election outcomes,
including “Obama wins with 270 electoral votes”, “Obama wins with 271 electoral votes”, “Obama wins with
272 electoral votes”, etc. The event a plane crashes contains so many possible outcomes I can’t even begin to
list them, but one is what happened last Thursday in the Hudson.
2.1
Set Notation
Because events are collections of outcomes, we need to review sets. We will refer to sets as A, B, C, . . . and
elements within sets as a1 , a2 , . . .. To define a particular set, we will write, for example,
A = {a1 , a2 , a3 }
Definition: universal set
The universal set, S, is the set of all possible elements.
In the parlance of probability, this reads, “The sample space, S, is the set of all possible outcomes.”
3
Definition: subset
A subset, A ⊂ B means that if a ∈ A, then a ∈ B.
Definition: null set, ∅
The null set or empty set is the set {} containing no outcomes.
Definition: union
The union of two sets A ∪ B is the set of all elements in A or B, i.e. a ∈ A ∪ B ⇔ a ∈ A OR
a∈B
Definition: intersection
The intersection of two sets A ∩ B is the set of all elements in A and B, i.e. a ∈ A ∩ B ⇔ a ∈ A
AND a ∈ B.
Definition: complement
The complement of A, written Ā is the set of all elements NOT in A, i.e. a ∈ Ā ⇔ a ∈
/ A.
Definition: mutually exclusive
Two sets are mutually exclusive or disjoint if A ∩ B = ∅.
Lemma 1. A ∪ Ā = S
Proof. To prove this result, you would need to show A ∪ Ā ⊂ S AND S ⊂ A ∪ Ā.
Lemma 2 (distributive laws).
A ∩ (B ∪ C)
=
(A ∩ B) ∪ (A ∩ C)
A ∪ (B ∩ C)
=
(A ∪ B) ∩ (A ∪ C)
Proof. Easiest to prove this result with Venn diagrams.
Lemma 3 (DeMorgan’s Law).
A ∩ B = Ā ∪ B̄
Proof. First we must show A ∩ B ⊂ Ā ∪ B̄. We will do so by contradiction.
/ Ā ∪ B̄.
Suppose ∃a ∈ A ∩ B for which a ∈
If a ∈
/ Ā ∪ B̄, then a ∈
/ Ā and a ∈
/ B̄ (DeMorgan’s logic: ¬(P ∨ Q) = (¬P ) ∧ (¬Q)).
If a ∈
/ Ā and a ∈
/ B̄, then a ∈ A and a ∈ B.
If a ∈ A and a ∈ B, then a ∈ A ∩ B, which contradicts our original premise that a ∈ A ∩ B.
You can prove the other part Ā ∪ B̄ ⊂ A ∩ B.
4
© Copyright 2026 Paperzz