Stat 230 Final Exam Spring 2012 NAME: 1. Suppose that we will do

Stat 230
Final Exam
Spring 2012
NAME:
1. Suppose that we will do a retrospective case control study to study the relationship between smoking
(or not) and the risk of heart disease (yes or no).
We want to study a sample of 1000 women in the 50 to 70 year-old age group. Briefly describe:
-how you would collect data for this study:
-advantage of using retrospective sampling
-
Disadvantage of using retrospective sampling
2. A randomized experiment will be done to compare the effectiveness of three methods for memorizing
information. Ninety volunteers will participate. Each will use one of the methods to memorize a list of
facts. All participants will then be given a test with questions about the facts.
a) Describe a completely randomized design for this experiment.
b) Suppose there are 60 “young volunteers and 30 “old” volunteers and we wish to block by age.
Describe how you would carry out a randomized block design in which the blocking factor is age
group.
3. Data are gathered from students to see if students who procrastinate are more likely to get colds and the
flu during the semesters. In five different statistics classes, students are asked how often they
procrastinate and how many times they had colds or the flu during the previous semester. Pick the type
of data collection used for this research from the following list AND explain briefly why you think this
is the correct answer.
a) Randomized block design b) randomized experiment c) Case control d) Observational study.
4. Research is done to see whether taking the oral contraceptives increases women’s blood pressures. The
blood pressures of women who take the oral contraceptives are compared to the blood pressures of
women who don’t take oral contraceptive. A complicating factor is that the women who take oral
contraceptives tend to be younger than others. This must be taken into account because blood pressure
increases with age. Using only the information given in this problem, explain whether age is a
confounding variable, an interacting variable, or both.
5. What does the phase “95% confident “mean?
(Just circle correct statements – could be one or more than one)
a) If we sample 100 times , 95 of the confidence intervals will cover the true population mean μ
b) There is a 0.95 probability that the true population mean μ will be included in the computed above
confidence interval
c) If we sample repeatedly (If we take all possible samples), about 95% of the confidence intervals
will contain the true population mean μ.
d) There is a 0.95 probability that the sample mean
confidence interval
x will be included in the computed above
6. Blood samples from 40 patients are taken before and after they drank a liter of a certain orange juice.
We are interested in knowing if there was a significant change in the amount of glucose in the blood. The
appropriate method to be used would be:
a)
b)
c)
d)
Z-test of proportions
Two-sample t-test
Paired t-test
Chi-square test.
7. Matching: Each numbered answer is used exactly once.
A. One-way ANOVA
B. Two- way ANOVA
C. Simple Linear Regression
D. ANCOVA
E. Multiple regression
-
……….. Quantitative continuous response & One quantitative continuous explanatory variable
………… Quantitative continuous response & One categorical continuous explanatory variable
…………. Quantitative continuous response & One quantitative continuous and one categorical
explanatory variables
………….. Quantitative continuous response & At least two quantitative continuous explanatory
variables
…………. Quantitative continuous response & Two categorical explanatory variable
8. The multiple comparison procedure designed for comparing other groups to a control group is
a) Tukey’s
b) Scheff’s
c) Dunnett’s
d) Fisher’s
9. Data from n=1594 Penn State students was used to create the following plot of mean hours studied per
week (the vertical axis in the plot). Students are classified by gender and by where they like to sit in a
classroom (back, front, middle)
a) On the basis of the graph just given discuss the nature of the main effects of preferred seat area and
gender on mean hours studied per week. That is, why does it look there may be significant main
effects here?
b) Discuss whether you think the interaction might be significant or not, refer to the various
classifications in this problem, don’t just talk about lines being parallel or not.
c) Recall that n=1594 students. Also remember that there were three seating locations and two genders.
Suppose that we were to do a two-way analysis of variance. Give df values for each of the following
sources:
Seat Area df=
Gender df=
Interaction df=
Error df=
10. Suppose that we compare the success rates of two treatments for a quite rare, hard to treat eye
disease. In an experiment, 15 people use treatment A: there are 2 successes and 13 failures. In the
same experiment, a different 13 people use treatment B: there are 4 successes and 9 failures.
Fisher’s Exact test will be used to analyze the data in order to see if we can decide that treatment B
is better than treatment A.
a) Write Ho and Ha
b) Write how to calculate the p-value in this situation .Give some details- don’t just say, use the
hypergeometric “, but actually set numbers – you don’t need to calculate probabilities.
c) The p-value = 0.2550725. What is your conclusion?
11.In the last few years, many research studies have shown that the purported benefits of hormone
replacement therapy do not exist, and in fact, that hormone replacement therapy actually increases
the risk of several serious diseases. A four year experiment involved 2000 women. Half of the
women took placebo and half took Prempro, a widely prescribed type of hormone replacement
therapy. There were 40 cases of dementia in the hormone group and 20 cases in the placebo group.
Is there sufficient evidence to indicate that the risk of dementia is higher for patients using Prempro?
Test at the 1% level of significance.
a) State Ho, Ha, check assumptions and what is the conclusion.
b. Compute 99% CI for the difference in proportions. Write the interval and write a sentence that
interprets the interval.
c)To match the test in part a – you should really compute, not a 99% C.I, but ………………
d) Compare the odds of having dementia for females in the placebo group compared to females using
Prempro. Calculate the ratio of odds and write a sentence that interprets the value.
12. The management of a small computer chip manufacturing plan wished to study the effects of its supervisors (Factor
A) and three shifts (day, evening, night) (Factor B) on production output per shift. The management observed the three
supervisors on four randomly selected days for each of the three different shifts, and the number of computer chips
produced was recorded. Consider the fixed factor effects Anova Model for two-factor studies:
= +
+ +
+
Anova Table
Sum Sq
Df
Mean Sq
F value Pr(>F)
Factor A
118.222
….
59.111
…….
0.0002
Factor B
73.389
….
36.694
7.62
0.0024
Factor A*Factor B 1.944
…..
0.486
0.10
0.9812
Residuals
…………
…… ……….
Total
323.556
…..
a) Fill in the missing values of the Anova table.
b) Conduct a test whether or not interaction effects are present, use α=0.05
State the null and alternative hypotheses:
What is the value of the test statistic for this test?
What degrees of freedom are used for this test?
What is the p-value for this test?
State your conclusions, including a discussion of the conclusions by the management of the plant
What does it mean to say Factor A and Factor B do not interact?
c) Conduct a test whether or not main effects for supervisor are present , use α=0.05
State the null and alternative hypotheses:
What is the value of the test statistic for this test?
What degrees of freedom are used for this test?
What is the p-value for this test?
State your conclusions, including a discussion of the conclusions by the management of the plant
d) Conduct a test whether or not main effects for shift are present , use α=0.05
State the null and alternative hypotheses:
What is the value of the test statistic for this test?
What degrees of freedom are used for this test?
What is the p-value for this test?
State your conclusions, including a discussion of the conclusions by the management of the plant
e) Use the following output. Find the confidence intervals for
= - and = - .
Use the Tukey procedure with family confidence 95%.Summarize your findings, Are your findings
consistent with your findings above for hypothesis tests for main effects?
Level of i
1
2
3
N
12
12
12
Mean
573.333
577.000
577.333
Std,Dev
2.570
2.412
2.498
Level of j
1
2
3
N
12
12
12
Mean
573.917
576.500
577.250
Std.Dev
2.937
2.969
2.301