An Explanation of the Concept of Independent Events The city of

An Explanation of the Concept of Independent Events
The city of Metropolis is about to have an election. Let us say that 46% of its citizens support candidate
Larch for Mayor. To simplify matters the population is divided into three racial groups, White, Asian, and
Latino.
Question 1 – Is being white independent of support for candidate Larch for Mayor? What this is trying to
ask is, does the white support for candidate Larch differ from the population as a whole?
Thus, the question of independence is always asking if a subgroup of the population differs in
proportion compared to the population as a whole.
Suppose that we answer that the white support for candidate Larch is 46%. Then, the two events “a person
supports Larch,” and a white person supports Larch,” are independent.
Suppose that it is not 46%, then the two events are not independent. Which means the proportion changes
when we look at the subgroup, “White,” versus the entire population.
Why, is independence such an important concept? For our class this question is two-fold. During this
chapter, the main reason is that if we have independence we can calculate certain probabilities in an easier
manner. In statistics as a whole, when variables are independent, the result is that no new information can be
ascertained by associating the variables; we will make this clear in the examples that follow.
The table below contains the population of Metropolis divided into support for the candidate, and racial
makeup in 100’s. Metropolis holds a million people of voting age. Questions 2 through 5 refer to this table.
For
Against
Total
Asian
805
945
1750
Latino
1311
1539
2850
White
2484
2916
5400
Total
4600
5400
10000
Question 2 – How many people are classified as “For” candidate Larch? Answer: 4600 (100’s) or 460,000
people.
Question 3 – If a person is chosen at random, what is the probability of them being “For” candidate Larch?
Answer:
I will use function notation (this is important given what is about to follow) to represent the question;
P(For). When this question is asked, I am assuming I will answer it with respect to the population in
question which is the voting population of Metropolis.
P(For) =
4600
= 0.46
10000
Question 4 – What is the probability that if a Latino voter is chosen at random, that this person is for
candidate Larch?
In order to answer this question I am going to introduce another notation that represents the
question’s situation.
What is the situation? Read the sentence again, and ask yourself what is the population that is being
discussed here? If you say voting citizen’s of Metropolis, you would be wrong. The population being
discussed is Latino’s living in Metropolis. Because this is a subset of the Metropolis group, I need
notation depicting this; THIS IS IMPORTANT!! Do not bypass this, unless your goal is to fail the next
exam. Practice using/writing this notation, when it is called for.
Let A and B be two events. The notation P(A | B) which reads, the probability of A given B. The
vertical line “| “ is read “given”. Let me restate the interpretation. What is the probability of event A,
given that we will only consider event A, within the context of the subpopulation called B, INSTEAD of
the original population.
So question 4, I would write as P(For | Latino), which says I want to calculate the probability of people
that are “for” candidate Larch, but only considering the subgroup from Metropolis, called Latinowhat proportion of Latinos are for the candidate Larch. Now look at what numbers I am going to use
to answer the question, and hopefully it will make the meaning absolutely clear.
For
Against
Total
Asian
805
945
1750
Latino
1311
1539
2850
White
2484
2916
5400
Total
4600
5400
10000
1311
= 0.46 Notice the numbers I used to answer the question. I had to use 2850 for
2850
the denominator because that is how many people are Latino. Now of those 1311 are Latino.
P(For | Latino) =
4600
= 0.46. Now the fact that both produced the same proportion,
10000
0.46 says that the event a person is “for” the candidate, and “a person is Latino” are independent, that
is the proportion of Latino’s in support of Larch is no different that the rest of the population as a
whole.
This is different from P(For) =
If the event A, and event B are independent then,
P(A | B) = P(A) and P(B | A) = P(B)
Question 5 – What is the probability that if a White person is chosen at random, that this person is “for”
candidate Larch? What is the probability that if Asian person is chosen at random that this person is “for”
candidate Larch?
P(For | White) =
2884
= 0.46
5400
Asian
805
945
1750
For
Against
Total
Latino
1311
1539
2850
P(For | Asian) =
White
2484
2916
5400
805
= 0.46
1750
Total
4600
5400
10000
The results show that overall, that race is independent from support from the candidate. That is, race is not a
factor in candidate support since each group’s proportional support is the same as the whole of 46%.
Now we divide the citizens of Metropolis according to age. Notice that 46% of citizens are in favor of
candidate Larch. Look carefully at how the questions are asked and how the answer relates to the question.
Age Group (years)
For
Against
Total
18 - 27
1058
1242
2300
27 - 38
1400
1900
3300
over 38
1800
2600
4400
Total
4600
5400
10000
Question 6 – What is the probability that we choose someone at random that is 27 – 38 years of age?
P(27 – 38) =
3300
= 0.33
10000
Question 7 – Given that a person is 27-38 years of age, what is the probability that they are for the candidate?
P(For | 27 – 38) =
1400
= 0.42
3300
Question 8 – Are the events a person is “for” candidate Larch, and the event a person is “27-28 years”
independent?
No, since the proportion of 27-28 year olds that are for candidate Larch, does not match the
population as whole.
P(For | 27 – 38) ≠ P(For)
There are two formulas that require the events be independent if they are to be true statements. One we have
been discussing, P(A | B) = P(A). The other appears in section 4.1, P(A and B) = P(A)P(B) but only if the
events are independent. Thus, if the question involves whether two events are independent, you must use
one of these formulas to determine if they are or are not independent. Let’s consider the Metropolis example
to see how the P(A and B) = P(A)P(B) formula works.
Age Group (years)
For
Against
Total
18 - 27
1058
1242
2300
27 - 38
1400
1900
3300
over 38
1800
2600
4400
Total
4600
5400
10000
Question 9 – Are the two events a person is 18- 27 years of age independent from the event a person is for
candidate Larch?
One way to make the determination is to use the formula P(A and B) = P(A)P(B). I will need to calculate
each side separately.
P(18 – 27 AND For) =
1058
= 0.1058 (To get my answer I need only look at the appropriate slot in the
10000
table.)
Now I will calculate the other side. P(18 – 27) =
2300
4600
= 0.2300, P(For) =
= 0.4600.
10000
10000
Lastly, we need to see if we have equality.
P(18 – 27 AND For) = P(18-27)P(For)
0.1058 = (0.2300)(0.4600)
= 0.1058 Since both sides are equal we see that the two events are independent.
Question 10 – Are the two events a person is against candidate Larch and a person is over 38 independent?
P(Against AND over 38) =
2600
= 0.26
10000
P(Against) =
5400
4400
= 0.54, P(over 38) =
= 0.44
10000
10000
P(Against AND over 38) = P(Against) P(over 38)
0.26 = 0.54(0.44)
≠ 0.2376 Since we do not have equality we don’t have independence.
Homework
Use the following table to answer the questions below. A poll asked 10,000 people selected at random in
the State of Texas to rate their job happiness. Assume that the table exactly represents the correct
proportions for Texas.
Happy
Neutral
Not Happy
Total
18 - 27
1200
1026
474
2700
Age Group
27 - 38
over 38
1722
1178
1596
1178
882
744
4200
3100
Total
4100
3800
2100
10000
a. A person is chosen at random from the state of Texas. What is the probability that this person is happy at
work?
b. A person within the age group 18 – 27 is chosen at random from the state of Texas. What is the
probability that this person is happy at work?
c. Are the two events a person is “Happy” at work, and a person is in the age group 18 – 27 years of age
independent? It is assumed that I am only considering the Texas population.
d. What is the probability that someone is in the age group 27 – 38 in the state of Texas?
e. A person is chosen at random from the state of Texas. That person is “Not Happy” at work. What is the
probability that they are in the age group 27 – 38?
f. Are the two events a person is in the age group “27 – 38,” and someone is “Not Happy” at work
independent?
g. A person is chosen at random from the state of Texas. That person is “over 38.” What is the probability
that they are “Neutral”?
h. Are the two events a person is in the age group “over 38,” and someone is “Neutral” at work independent?
Use both formulas to check your result.
i. Use the P(A and B) = P(A)P(B) to see if the events a person is over 38 and a person is not Happy at work
independent.
Answers
a. P(Happy) = 0.41
b. P(Happy | 18 – 27) = 0.44
d. P(27 – 38) = 0.42
e. P(27 – 38 | Not Happy) = 0.42
g. P(Neutral | over 38) = 0.38
c. The two events are not independent.
f. The two events are independent.
h. Yes, the two events are independent.
i. The two events are not independent.