Name: _______________________________ Final Part 2 (Take‐home, open everything and everyone) Throughout the exam, show your work and, unless specified otherwise, round all your final answers to 3 decimal places, e.g. 1.0015 rounds to 1.002. Remember to round only at the very end of your calculations. Points will be deducted for rounding errors. In a follow‐up to my Comcast experiment, I perform a study with the following design. At the start of each day, I roll a die to determine if I will use the new router, old router, or plug in directly with the cord. At the end of each day, I record whether I was satisfied or dissatisfied with my internet service that day. image used without permission from http://static.bhphoto.com/images/images
500x500/440466.jpg New Oldt Cord Dissatisfied 50 22 25 Satisfied 110 90 100 t Typo in data fixed for part 2; thus a couple of your answers will differ from your part 1 solutions. *Q1‐4pts) In mathematical terms, using Greek letters to represent parameters, what is the null hypothesis for a Chi‐Squared test of this data? Ho: θnew = θold = θcord *Q2‐4pts) At a 5% significance level, what would be the rejection region for this test? qchisq(0.95,2) = 5.991465 ≈ 5.991 *Q3‐7pts) Under the null hypothesis, what is the expected number of dissatisfied days for the new router, i.e. when calculating the Chi‐square test statistic, what would you use for the expected count “E” for the new router? ppooled,dissatisfied = (50+22+25)/(50+22+25+110+90+100) = 0.2443325 Enew,dissatisfied = ppooled,dissatisfied * nnew = ((50+22+25)/(50+22+25+110+90+100)) * (50+110) = 39.093. 1 The following is output from a quick analysis of question 1's data using R. # The data > x = matrix( c(50, 110, 22, 90, 25, 100), nrow=2 ) > x [,1] [,2] [,3] [1,] 50 22 25 [2,] 110 90 100 # Comparing all three internet connection methods to each other > chisq.test(x, correct=F) Pearson's Chi‐squared test X‐squared = 6.7494, df = 2, p‐value = 0.03423 # Comparing the new router to the old router > chisq.test( x[,c(1,2)] , correct=F) Pearson's Chi‐squared test X‐squared = 4.5603, df = 1, p‐value = 0.03272 # Comparing the new router to the corded connection > chisq.test( x[,c(1,3)] , correct=F) Pearson's Chi‐squared test X‐squared = 4.5804, df = 1, p‐value = 0.03234 # Comparing the old router to the corded connection > chisq.test( x[,c(2,3)] , correct=F) Pearson's Chi‐squared test X‐squared = 0.0047, df = 1, p‐value = 0.9451 *Q4‐4pts) Using the analyses provided above, write the conclusion for your hypothesis test using a 5% Type I error rate. (For sake of time, do not include any confidence intervals in your conclusion.) At a 5% significance level, we reject the null hypothesis that there are no differences in the true dissatisfaction rates between the three methods of connecting to the internet. (Note, commenting on the pairwise analyses is unnecessary but okay here because the question refers to "the analyses above".) *Q5‐4pts) The analysis above also tests all of the pairwise comparisons (new vs. old, new vs. cord, old vs. cord). Using a Bonferroni correction to ensure a Family‐wise Type I error rate of 5%, write your conclusions from the pairwise analysis. (For sake of time, do not include any confidence intervals in your conclusions.) At a Family‐wise Type I error rate of 5% (that is interpreting each test at a level 0.0167), we do not reject the null hypothesis of equal dissatisfaction rates for any of the three pairwise comparisons. 2 *Q6‐5pts) Referring to the data in question 1, calculate the risk ratio for the risk of dissatisfaction using the new router vs. the old router. RR = (50/(50+110)) / (22/(22+90)) = 1.590909 ≈ 1.591 **Q7‐8pts) Calculate a 95% confidence interval for the odds ratio for the odds of dissatisfaction using the new router vs. the old router. OR = (50*90)/(22*110) = 1.859504 log(OR) = 0.6203099 SE = sqrt( 1/50 + 1/90 + 1/22 + 1/110 ) = 0.2926714 log(LB) = log((50*90)/(22*110)) ‐ 1.96*sqrt( 1/50 + 1/90 + 1/22 + 1/110 ) = 0.04667385 log(UB) = log((50*90)/(22*110)) + 1.96*sqrt( 1/50 + 1/90 + 1/22 + 1/110 ) = 1.193946 LB = exp( log((50*90)/(22*110)) ‐ 1.96*sqrt( 1/50 + 1/90 + 1/22 + 1/110 ) ) = 1.04778 UB = exp( log((50*90)/(22*110)) + 1.96*sqrt( 1/50 + 1/90 + 1/22 + 1/110 ) ) = 3.300077 Using a more precise critical value, LB = exp( log((50*90)/(22*110)) ‐ qnorm(0.975)*sqrt( 1/50 + 1/90 + 1/22 + 1/110 ) ) = 1.047791 UB = exp( log((50*90)/(22*110)) + qnorm(0.975)*sqrt( 1/50 + 1/90 + 1/22 + 1/110 ) ) = 3.300042 Rounded to three decimals, CI = (1.048, 3.300) Check Solution in R: library(epitools) x = matrix( c(50, 110, 22, 90), nrow=2 ) epitab(x) round( epitab(x)$tab[2,6:7], 3 ) > 1.048 3.300 3 I repeated my original Comcast experiment that looked at the ping times (ms) for my old and new router connecting to servers in 12 different states. The data are below with a quick analysis in R to the right. State New Old # read in the dataset, call it data ME 8 6 # Unequal Variance Two‐Sample t‐test NY 7 6 > with(data, t.test( New, Old, equal.var=F ) ) PA 7 5 Welch Two Sample t‐test VA 5 6 t = 0.8876, df = 21.812, p‐value = 0.3844 TN 6 5 AR 6 5 # Two‐sample Wilcoxon‐Mann‐Whitney Rank Sum test OK 7 5 > with(data, wilcox.test( New, Old, correct=F) ) TX 8 10 Wilcoxon rank sum test NM 11 8 AZ 12 8 W = 98, p‐value = 0.1282 CA 10 7 AK 21 19 **Q8‐7pts) Calculate the exact two‐sided p‐value for a sign test on this dataset. p‐value = 2 * (choose(12,10)+choose(12,11)+choose(12,12)) * 0.5^12 = 0.03857422 = 0.039 Check solution in R: binom.test(10,12)$p.value = 0.03857422 **Q9‐4pts) Neither the unequal variance two‐sample t‐test or the two‐sample Wilcoxon rank sum test shown above reject at a 5% significance level. Considering just these two tests and the sign test from question 7, which of the three tests would you recommend for this dataset and why? Right answer: I would recommend the sign test. Distance from TN (as captured by state) clearly has a strong effect on ping time. Neither the unequal variance two‐sample t‐test or the two‐sample Wilcoxon‐Mann‐Whitney rank sum test account for this variation; whereas, the sign test accounts for the state effects by looking at paired differences. The Type I error rate may be conservatively high for the non‐paired tests, and the sign test has the potential to have lower Type II error (be more powerful) than the other two. Aside: It's not guaranteed the sign test will be more powerful. It depends on how large is the state effect on ping time. If the state effect was smaller, the sign test could be the weakest of the three tests. Wrong answer: AK is an outlier for ping time, which could suggest non‐normal or skewed distributions. This will hurt the t‐test's validity, but not the Wilcoxon's test. So this alone will not justify use of the sign test. 4 I conducted a variation of the experiment in question 8 as follows. Using my old and new routers, I connected to each of my 200 favorite websites. Because I didn't have a way to measure exact connection times, I recorded whether the connection was fast enough to not be annoying to an impatient guy like me. Here are the results. Old Router Annoyed Happy New Router Annoyed Happy 10 1 8 181 18 182 11 189 200 *Q10‐3pts) What is the name of the statistical test you would use to analyze this study? McNemar's test. **Q11‐7pts) Calculate the exact two‐sided p‐value for the correct answer to question 10. p‐value = 2 * (choose(9,8) + choose(9,9)) * 0.5^9 = 0.0390625 = 0.039 Check Solution in R: binom.test(8,9)$p.value > 0.0390625 *Q12‐3pts) Based just on the results of the experiment in question 10, what advice would you give me regarding which router I should use? Good Answer (enough for full credit): Use the old router. At a 5% level of significance, the old router significantly outperforms the new router in terms of happiness. An even better answer would include something like: Both the old and new routers had fairly high happiness rates, 94.5% and 90.5% respectively. However, a 4% higher happiness rate would, on average, translate to an extra two weeks of happiness over the course of a year. 5 Recall the dataset from question 1 where I recorded my satisfaction with my internet connection method at the end of each day. image used without permission from http://static.bhphoto.com/images/images
500x500/440466.jpg New Old t Cord Dissatisfied 50 22 25 Satisfied 110 90 100 t Typo in data fixed for part 2; thus a couple of your answers will differ from your part 1 solutions. ***Q13‐8pts) Referring to the data from question 1 (copied above for your convenience), calculate a 1/8th support interval for the risk of my being dissatisfied with the new router on a given day. Round final solution to 2 decimals. I can solve for this using the bin.lik() function provided in class. Specifically, bin.lik( 50, 160 ) (0.24, 0.39) 1.0
Likelihoods: Binomial Model
0.0
0.2
0.4
0.6
0.8
Max at 0.31
1/8 SI ( 0.24 , 0.39 )
1/32 SI ( 0.22 , 0.41 )
0.0
0.2
0.4
0.6
0.8
1.0
Probability
I can also solve this by hand using the Newton‐Raphson method (see workshop solutions for the set‐up). f = function(t, n, x){ t^x * (1‐t)^(n‐x) ‐ (x/n)^x * (1‐x/n)^(n‐x) / 8 } fp = function(t, n, x){ x * t^(x‐1) * (1‐t)^(n‐x) ‐ (n‐x) * t^x * (1‐t)^(n‐x‐1) } Iterate the process: theta = theta ‐ f(theta, 160, 50) / fp(theta, 160, 50); theta; Using the 95% CI bounds as starting points: 0) theta = 50/160 ‐ 2*sqrt(50/160 * 110/160 / 160) 0) theta= 50/160 + 2*sqrt(50/160 * 110/160 / 160) 0) 0.3857877 0) 0.2392123 1) 0.3896848 1) 0.241563 2) 0.3900026 2) 0.2414423 3) 0.3900046 3) 0.241442 4) 0.3900046 4) 0.241442 LB) 0.241 UB) 0.390 6 ***Q14‐5pts) Based on using the routers prior to the experiment in question 1, I had a fairly strong belief that the new router was performing worse than the old one. I could capture this belief with the following prior distributions for the risks of being disappointed. Prior for new router's risk ~ Beta( 6, 4 ) and Prior for old router's risk ~ Beta( 3, 7 ) What is my prior mean expectation of the risk ratio for the risk of dissatisfaction using the new router vs. the old router? Round final solution to 2 decimals. I can simulate large samples from each of the prior distributions, and assuming independence between the prior distributions, take the ratio to simulate the distribution of the RR. Specifically, NewPrior = rbeta( 10^6, 6, 4 ) OldPrior = rbeta( 10^6, 3, 7 ) RRPrior = NewPrior / OldPrior mean( RRPrior ) 2.701187 round( mean( RRPrior ), 2 ) 2.70 7 ***Q15‐8pts) Referring to question 13 and the data from question 1, what is my mean posterior expectation for the risk ratio? In other words, after seeing the data, what is my new expectation of the risk ratio for the risk of dissatisfaction using the new router vs. the old router? Round final solution to 2 decimals. NewPosterior = rbeta( 10^6, 6+50, 4+110 ) OldPosterior = rbeta( 10^6, 3+22, 7+90 ) RRPosterior = NewPosterior / OldPosterior mean( RRPosterior ) 1.660352 round( mean( RRPosterior ), 2 ) 1.66 8 Recall the experiment in question 8 where I looked at the ping times (ms) for my old and new routers when connecting to servers in 12 different states. In class, we suggested that the Wilcoxon signed rank test should outperform the sign test in settings like these. But by how much? Define the relative efficiency for a two‐sided 5% level test comparing the two statistical tests as the ratio of the tests’ power under certain conditions, i.e. RE = Power(Wilcoxon Signed Rank Sum test) / Power(sign test). Calculate the relative efficiency under the following setting. Round final solution to 2 decimals. ***Q16‐12pts) Let Tnew and Told equal the ping times in ms. Tnew ~ N(μ=8, σ=1). Nnew = 12. Told ~ N(μ=7, σ=1). Nold = 12. Tnew and Told share a mildly strong linear association where Corr[Tnew, Told] = 0.4. We can simulate a single realization of Tnew and Told as bivariate normal data. We can then apply the Wilcoxon Signed Rank Sum test and the sign test to this data. We can save the p‐values from the tests and repeat the process many times. Then we can count the number of p‐values < 0.05 for each test. The RE will equal the number of p‐values < 0.05 for the Wilcoxon Signed Rank Sum test divided by the number for the sign test. In R, it will look like the following. set.seed(17) library(MASS) loops=10^5 pw = rep(NA,loops) pb = rep(NA,loops) for( i in 1:loops ){ x = mvrnorm( 12, c(8,7), matrix(c(1,0.4,0.4,1),nrow=2) ) pw[i] = wilcox.test( x[,1], x[,2], paired=T )$p.value pb[i] = binom.test( sum(x[,1]‐x[,2]>0), 12 )$p.value # assuming ties never happen } alpha = 0.05 mean( pw<alpha ) mean( pb<alpha ) mean( pw<alpha ) / mean( pb<alpha ) round( mean( pw<alpha ) / mean( pb<alpha ), 2 ) > mean( pw<alpha ) [1] 0.78116 > mean( pb<alpha ) [1] 0.62756 > mean( pw<alpha ) / mean( pb<alpha ) [1] 1.244757 > round( mean( pw<alpha ) / mean( pb<alpha ), 3 ) [1] 1.245 > round( mean( pw<alpha ) / mean( pb<alpha ), 2 ) [1] 1.24 9 ***Q17‐7pts) Referring to question 16, the setting proposed oversimplifies the experiment in question 8 in a subtle, but possibly important way. Ping times are measured in whole milliseconds. We can't observe a ping time of 8.123 ms or 8.467 ms. We can only observe 8 ms. In the setting of question 16, show how this rounding impacts the power of the Wilcoxon signed rank test and the power of the sign test. Explain why the rounding resulted in the impacts you observed. Round final solution to 2 decimals. The estimation of the RE will follow the same process as that described in Q16. The key difference is that the data will be rounded during the simulation. This will create many ties (Tnew=Told), which are extremely unlikely in the setting of Q16. Because there are not universally accepted methods for handling ties, we'll need to decide how to handle ties. We opt to exclude the ties from the dataset. This should noticeably decrease the power of both methods as the sample size will be decreased whenever there are ties. (The RE will, of course, go to 1 as N goes to infinity. The RE is undefined for N < 5 because the power is 0 for both tests.) Note the wilcox.test p‐values will be approximate in both approaches due to ties in the magnitude of non‐zero differences, i.e. for two observations a and b, Tnew,a‐Told,a = Tnew,b‐Told,b.) set.seed(17) library(MASS) loops=10^5 pw = rep(NA,loops) pb = rep(NA,loops) Ns = rep(NA,loops) for( i in 1:loops ){ x = round( mvrnorm( 12, c(8,7), matrix(c(1,0.4,0.4,1),nrow=2) ) ) x = x[ x[,1] != x[,2], ] # drop any ties N = nrow(x) Ns[i] = N pw[i] = wilcox.test( x[,1], x[,2], paired=T )$p.value pb[i] = binom.test( sum(x[,1]‐x[,2]>0), N )$p.value } alpha = 0.05 mean( pw<alpha ) mean( pb<alpha ) mean( pw<alpha ) / mean( pb<alpha ) summary( Ns ) hist( Ns ) > mean( pw<alpha ) [1] 0.71655 > mean( pb<alpha ) [1] 0.5776 > mean( pw<alpha ) / mean( pb<alpha ) [1] 1.240564 > summary( Ns ) Min. 1st Qu. Median Mean 3rd Qu. Max. 3.000 8.000 9.000 9.162 10.000 12.000 10 We did observe the expected drop in power. Power drop for Wilcoxon signed rank test = 0.78116 ‐ 0.71655 = 0.06461 = 0.06 Power drop for signed test = 0.62756 ‐ 0.5776 = 0.04996 = 0.05 After removing ties, the median sample size was 9. The 75% percentile 10. As we see in the histogram of the Ns after removing ties below, relatively few simulations had a sample size of 12. 15000
0
5000
Frequency
25000
Histogram of Ns
4
6
8
10
12
Ns
11
© Copyright 2026 Paperzz