Example: Prevention of deep vein thrombosis (DVT) is a critical

Example: Prevention of deep vein thrombosis (DVT) is a critical issue in patients undergoing total hip
replacement surgery. Orthopedic surgeons recognize the importance of prophylaxis in the management
of their patients but do not agree on an optimal method. In this study, two different prophylaxis
methods are to be compared for the prevention of proximal DVT after total hip replacement surgery.
Patients undergoing total hip replacement were randomly assigned to one of the two prophylactics.
After surgery, it was noted whether patients had complications from proximal DVT or not. The results
are presented in the following contingency table.
DVT Complications No Complications Total
Treatment 1
3
72
75
Treatment 2
12
68
80
Total
15
140
155
Step 0: Define the research question
Is there evidence that that risk of DVT complications differs between the two prophylactic
treatments?
Step 1: Determine the null and alternative hypotheses
H0:
Ha:
Step 2: Finding the test statistic and p-value
Step 3: Report the conclusion in context of the research question
1
Observational Studies vs. Designed Experiments
Reconsider the above example. The Chi-square test provided evidence that the risk of complications
does differ between the two treatments (p-value = 0.0206). Now, the question is this: can we conclude
that it really is something with the two treatments that causes the risk to be higher for Treatment group
2?
The answer to this question lies in whether the experiment itself was a designed experiment or an
observational study.
 Observational study  Involves collecting and analyzing data ___________________
randomly assigning treatments to experimental units.
 Designed Experiment  A treatment is ____________________imposed on individual subjects
in order to observe whether the treatment causes a change in the
response.
Key statistical idea:
The random assignment of treatments used by researchers in a designed experiment should balance out
between the treatment groups any other factors that might be related to the response variable.
Therefore, designed experiments can be used to establish a cause-and-effect relationship (as long as the
p-value is small).
On the other hand, observational studies establish only that an association exists between the predictor
and response variable. With observational studies, it is always possible that there are other lurking
variables not controlled for in the study that may be impacting the response. Since we can’t be certain
these other factors are balanced out between treatment groups, it is possible that these other factors
could explain the difference between treatment groups.
Note that the “DVT complications” study is an example of a designed experiment since participants were
randomly assigned to the two groups. We were trying to show that there was a difference in risk of
complications between the two groups. The small p-value rules out observing the difference in risk
between these two groups (4% vs. 15%) simply by chance, and the randomization of subjects to
treatment groups should have balanced out any other factors that might explain the difference. So, the
only explanation left is that Treatment 1 is truly better than Treatment 2.
2
Example: Past research has suggested a high rate of alcoholism among patients with primary unipolar
depression. A study of 210 families of females with primary unipolar depression found that 89 had
alcoholism present. A set of 299 control families found 94 present. The data can be entered into JPM as
shown below.
Questions:
1. Identify the response variable.
2. Identify the explanatory variable.
Step 0: Define the research question
Is the alcoholism rate in females different among patients with primary unipolar depression
versus the control group? That is, is the proportion of the Depression group with Alcoholism
different from the proportion of the Control group with alcoholism?
Step 1: Determine the null and alternative hypotheses
H0:
Ha:
Step 2: Finding the test statistic and p-value
Step 3: Report the conclusion in context of the research question
Question:
3. Can we say that having unipolar depression causes alcoholism? Explain.
3
Methods for Two Categorical Variables – Relative Risks & Odds Ratios
Example: A field study was conducted to identify the natural predators of the gypsy moth
(Environmental Entomology, June 1995). For one part of the study, 24 black-capped chickadees were
captured in mist nets and individually caged. Each bird was offered a mass of gypsy moth eggs attached
to a piece of bark. Half the birds were offered no other food choice (no choice), and the other half were
offered a variety of other naturally occurring foods such as spruce and pine seed (choice). The raw data
can be found in the file chicadees.jmp on the course website.
The contingency table and mosaic plot for the data are given below.
Questions:
1. Using the contingency table, find the following marginal probability:
P(Choice) =
2. Using the contingency table, find the following marginal probability:
P(Eat gypsy moth eggs) =
3. Using the contingency table, find the joint probability that a chickadee has a food choice and
eats the gypsy moth eggs:
P(Choice and Eat gypsy moth eggs) =
4. Using the contingency table, find the following joint probability:
P(No Choice and Eat gypsy moth eggs) =
5. Using the contingency table, find the following conditional probability. Given that a chickadee
has a food choice, what is the probability they eat the gypsy moth eggs?
P(Eat gypsy moth eggs | Choice) =
4
6. Using the contingency table, find the following conditional probability:
P(Eat gypsy moth eggs | No Choice) =
Other summaries that are often used when investigating the relationship between categorical variables
are the risk difference, relative risk and odds ratio.
Relative Risk
Let’s again consider the data from the gypsy moth and chickadee example.
Ate gypsy moth eggs
2
8
10
Choice
No Choice
Total
Didn’t eat gypsy moth eggs
10
4
14
Total
12
12
24
We have seen P(Eat gypsy moth eggs | No Choice) is ____________________ than
P(Eat gypsy moth eggs | Choice). Since these conditional probabilities differ, it appears there may be an
__________________ between Food choice and whether the gypsy moth eggs are eaten. One way to
compare the two groups (Choice and No Choice) is to look at the relative risk.
Relative Risk: This is the measure of how much a particular risk factor ________________ the
risk of a specified outcome.
For the chickadee and gypsy moth example, we can calculate the relative risk as follows:
RR = Relative Risk =
=
P(Eat Gypsy Moth Eggs | No Choice)
P(Eat Gypsy Moth Eggs | Choice)
Proportion that ate the eggs in the No Choice Group
Proportion that ate the eggs in the Choice Group
=
Interpretation of this value:
5
Comments:
 A relative risk of _____ is the reference value for making comparisons. That is, a relative risk of
_____ says there is _____ difference in the two probabilities.
 The relative risk is easily displayed in the following mosaic plot.
 Alternatively, we could have calculated the relative risk as follows:
RR =
P(Eat Gypsy Moth Eggs | Choice)
=
P(Eat Gypsy Moth Eggs | No Choice)
Interpretation: The risk of the eggs being eaten for the _________ group is _____ times
more likely than the risk of the eggs being eaten for the _________ group.
Odds Ratio
The relative risk is frequently used when investigating the relationship between two categorical
variables. Although this quantity is relatively easy to calculate and interpret, statisticians often use
another quantity known as an odds ratio in this situation.
Odds: With counts given for two _________ categories (Choice and No Choice), the odds of
“yes” versus “no” is computed as the number of “yes” events versus the number of “no”
events for each group.
6
Let’s again consider the data from the gypsy moth and chickadee example.
Choice
No Choice
Total
Ate gypsy moth eggs
2
8
10
The odds of eggs eaten for No Choice =
Didn’t eat gypsy moth eggs
10
4
14
Total
12
12
24
Number that ate eggs in No Choice Group
Number that didn't eat eggs in No Choice Group
=
The Odds of eggs eaten for Choice =
Number that ate eggs in Choice Group
Number that didn't eat eggs in Choice Group
=
Odds Ratio: This is simply the _________ of the odds for the two groups:
OR = Odds Ratio =
Odds of eating eggs for No Choice Group
Odds of eating eggs for Choice Group
Interpretation of this value:
We could have also calculated the odds ratio in the following manner:
OR =
Odds of eating eggs for Choice Group
Odds of eating eggs for No Choice Group
Interpretation of this value:
7
Comments:
 An odds ratio of _____ implies there is no observable difference between the two odds.
 The odds can be visualized using the mosaic plot.
Relative Risks and Odds Ratios in JMP
We can get these values from JMP using the following directions.
 Once the mosaic plot, contingency table and Tests output has been created by choosing Analyze
 Fit Y by X, click on the little red arrow next to Contingency Analysis of Feb on Egg Mass? By
Food Choice. Then choose Relative Risk.
 The options to choose can be determined from the question or you can ask for all combinations
to be outputted. However, only one of the ratios is most appropriate for each scenario.
8
 The following output will be given at the bottom of the JMP output window.
 If you check the Calculate all combinations box in the dialogue box given above, you’ll get the
following output.
 If you select odds ratio from the same drop-down menu you should get the following output.
Note: JMP always alphabetizes the category names in the contingency table, and then divide the
columns left to right.
OR =
Odds of NOT eating eggs for the Choice group
Odds of NOT eating eggs for the No Choice group
=
9
Example: According to research reported in the Journal of the National Cancer Institute (April 1991),
eating foods high in fiber may help protect against breast cancer. The researchers randomly divided 120
laboratory rats into two groups of 60 each. All of the rats were injected with a drug that causes breast
cancer. Then each rat was fed a diet of fat and fiber for 15 weeks. However, level of fat and fiber were
varied between the two groups. At the end of the feeding period, the number of rates with cancer
tumors was determined for each group. The data is summarized in the contingency table below.
No Fiber
Fiber
Total
Tumors
46
34
80
No Tumors
14
26
40
Total
60
60
120
Questions:
7. Find the probability a rat had a tumor given they ate a no fiber diet.
8. Find the probability a rate had a tumor given they ate a fiber diet.
9. Find the relative risk of having a tumor for rats who had no fiber compared with those who had
fiber.
10. Interpret the relative risk found in Question 12.
11. Using JMP, find and interpret the odds ratio for this scenario.
12. Looking at the relative risk found in Question 12 and the odds ratio found in Question 14, is
there a relationship/association between eating fiber and getting cancer tumors? Explain.
10
Example: A study was conducted in 1991 by the University of Wisconsin and the Wisconsin Department
of Transportation in which linked police reports and discharge records were used to assess, among other
things, the risk of head injury for motorcyclists in motor-vehicle crashes. The data shown below can be
used to examine the relationship between helmet use and whether brain injury was sustained in the
accident.
Brain Injury
No Brain Injury
Total
Helmet
17
977
994
No Helmet
97
1918
2015
Total
114
2895
3009
Questions:
13. Using JMP, find and interpret the relative risk of brain injury.
14. Using JMP, find and interpret the odds ratio.
15. Looking at the relative risk found in Question 10 and the odds ratio found in Question 11, is
there a relationship/association between helmet use and brain injury? Explain.
11