Stat 230 Final Exam Spring 2012 NAME: 1. Suppose that we will do a retrospective case control study to study the relationship between smoking (or not) and the risk of heart disease (yes or no). We want to study a sample of 1000 women in the 50 to 70 year-old age group. Briefly describe: -how you would collect data for this study: -advantage of using retrospective sampling - Disadvantage of using retrospective sampling 2. A randomized experiment will be done to compare the effectiveness of three methods for memorizing information. Ninety volunteers will participate. Each will use one of the methods to memorize a list of facts. All participants will then be given a test with questions about the facts. a) Describe a completely randomized design for this experiment. b) Suppose there are 60 “young volunteers and 30 “old” volunteers and we wish to block by age. Describe how you would carry out a randomized block design in which the blocking factor is age group. 3. Data are gathered from students to see if students who procrastinate are more likely to get colds and the flu during the semesters. In five different statistics classes, students are asked how often they procrastinate and how many times they had colds or the flu during the previous semester. Pick the type of data collection used for this research from the following list AND explain briefly why you think this is the correct answer. a) Randomized block design b) randomized experiment c) Case control d) Observational study. 4. Research is done to see whether taking the oral contraceptives increases women’s blood pressures. The blood pressures of women who take the oral contraceptives are compared to the blood pressures of women who don’t take oral contraceptive. A complicating factor is that the women who take oral contraceptives tend to be younger than others. This must be taken into account because blood pressure increases with age. Using only the information given in this problem, explain whether age is a confounding variable, an interacting variable, or both. 5. What does the phase “95% confident “mean? (Just circle correct statements – could be one or more than one) a) If we sample 100 times , 95 of the confidence intervals will cover the true population mean μ b) There is a 0.95 probability that the true population mean μ will be included in the computed above confidence interval c) If we sample repeatedly (If we take all possible samples), about 95% of the confidence intervals will contain the true population mean μ. d) There is a 0.95 probability that the sample mean confidence interval x will be included in the computed above 6. Blood samples from 40 patients are taken before and after they drank a liter of a certain orange juice. We are interested in knowing if there was a significant change in the amount of glucose in the blood. The appropriate method to be used would be: a) b) c) d) Z-test of proportions Two-sample t-test Paired t-test Chi-square test. 7. Matching: Each numbered answer is used exactly once. A. One-way ANOVA B. Two- way ANOVA C. Simple Linear Regression D. ANCOVA E. Multiple regression - ……….. Quantitative continuous response & One quantitative continuous explanatory variable ………… Quantitative continuous response & One categorical continuous explanatory variable …………. Quantitative continuous response & One quantitative continuous and one categorical explanatory variables ………….. Quantitative continuous response & At least two quantitative continuous explanatory variables …………. Quantitative continuous response & Two categorical explanatory variable 8. The multiple comparison procedure designed for comparing other groups to a control group is a) Tukey’s b) Scheff’s c) Dunnett’s d) Fisher’s 9. Data from n=1594 Penn State students was used to create the following plot of mean hours studied per week (the vertical axis in the plot). Students are classified by gender and by where they like to sit in a classroom (back, front, middle) a) On the basis of the graph just given discuss the nature of the main effects of preferred seat area and gender on mean hours studied per week. That is, why does it look there may be significant main effects here? b) Discuss whether you think the interaction might be significant or not, refer to the various classifications in this problem, don’t just talk about lines being parallel or not. c) Recall that n=1594 students. Also remember that there were three seating locations and two genders. Suppose that we were to do a two-way analysis of variance. Give df values for each of the following sources: Seat Area df= Gender df= Interaction df= Error df= 10. Suppose that we compare the success rates of two treatments for a quite rare, hard to treat eye disease. In an experiment, 15 people use treatment A: there are 2 successes and 13 failures. In the same experiment, a different 13 people use treatment B: there are 4 successes and 9 failures. Fisher’s Exact test will be used to analyze the data in order to see if we can decide that treatment B is better than treatment A. a) Write Ho and Ha b) Write how to calculate the p-value in this situation .Give some details- don’t just say, use the hypergeometric “, but actually set numbers – you don’t need to calculate probabilities. c) The p-value = 0.2550725. What is your conclusion? 11.In the last few years, many research studies have shown that the purported benefits of hormone replacement therapy do not exist, and in fact, that hormone replacement therapy actually increases the risk of several serious diseases. A four year experiment involved 2000 women. Half of the women took placebo and half took Prempro, a widely prescribed type of hormone replacement therapy. There were 40 cases of dementia in the hormone group and 20 cases in the placebo group. Is there sufficient evidence to indicate that the risk of dementia is higher for patients using Prempro? Test at the 1% level of significance. a) State Ho, Ha, check assumptions and what is the conclusion. b. Compute 99% CI for the difference in proportions. Write the interval and write a sentence that interprets the interval. c)To match the test in part a – you should really compute, not a 99% C.I, but ……………… d) Compare the odds of having dementia for females in the placebo group compared to females using Prempro. Calculate the ratio of odds and write a sentence that interprets the value. 12. The management of a small computer chip manufacturing plan wished to study the effects of its supervisors (Factor A) and three shifts (day, evening, night) (Factor B) on production output per shift. The management observed the three supervisors on four randomly selected days for each of the three different shifts, and the number of computer chips produced was recorded. Consider the fixed factor effects Anova Model for two-factor studies: = + + + + Anova Table Sum Sq Df Mean Sq F value Pr(>F) Factor A 118.222 …. 59.111 ……. 0.0002 Factor B 73.389 …. 36.694 7.62 0.0024 Factor A*Factor B 1.944 ….. 0.486 0.10 0.9812 Residuals ………… …… ………. Total 323.556 ….. a) Fill in the missing values of the Anova table. b) Conduct a test whether or not interaction effects are present, use α=0.05 State the null and alternative hypotheses: What is the value of the test statistic for this test? What degrees of freedom are used for this test? What is the p-value for this test? State your conclusions, including a discussion of the conclusions by the management of the plant What does it mean to say Factor A and Factor B do not interact? c) Conduct a test whether or not main effects for supervisor are present , use α=0.05 State the null and alternative hypotheses: What is the value of the test statistic for this test? What degrees of freedom are used for this test? What is the p-value for this test? State your conclusions, including a discussion of the conclusions by the management of the plant d) Conduct a test whether or not main effects for shift are present , use α=0.05 State the null and alternative hypotheses: What is the value of the test statistic for this test? What degrees of freedom are used for this test? What is the p-value for this test? State your conclusions, including a discussion of the conclusions by the management of the plant e) Use the following output. Find the confidence intervals for = - and = - . Use the Tukey procedure with family confidence 95%.Summarize your findings, Are your findings consistent with your findings above for hypothesis tests for main effects? Level of i 1 2 3 N 12 12 12 Mean 573.333 577.000 577.333 Std,Dev 2.570 2.412 2.498 Level of j 1 2 3 N 12 12 12 Mean 573.917 576.500 577.250 Std.Dev 2.937 2.969 2.301
© Copyright 2026 Paperzz