LAB 4 – Inference for 2 Independent Population Means, Chi-Squared Test for Independence, One-Way ANOVA The ECA 225 has open lab hours if you need to finish LAB 4. The lab is open Monday-Thursday 6:30-10:00pm and Saturday-Sunday 2:00-6:00pm. The last day the labs are open coincides with the last day of ASU regularly scheduled class. To download R onto your own personal computer, go to: http://cran.r-project.org/bin/windows/base/ Click on the link for R-2.6.1-win32.exe. Save the file to your computer. Then click on the file to start the installation to your computer. Your submission to LAB 4 should consist of answering the numbered questions as you work through the Lab. ***AS YOU ARE WORKING THROUGH THE LAB, copy and paste each output into a blank word file**** You can either print the completed word file out and turn that in, or you can e-mail the word file to me for you LAB 4 grade. Everything MUST be done in R and included in your word file. *********************************************************************** *********************************************************************** Access R On the desktop or through the Programs Menu, find the R icon and click on it. You should be brought to a screen with a command prompt. Two Independent Population Means The dataset for this example can be found on my website and is saved as wings.txt. This example is looking at two subspecies of dark-eyed Juncos. One of the subspecies migrates each year and the other does not migrate. One of the variables measured was wing length. The unit of measurement is in millimeters. We are interested if there is a difference between the average wing length of migratory and nonmigratory Juncos. Now, conduct a t-test on the independent population means, first downloading the data set. >site=”http://math.asu.edu/~coombs/wings.txt” >wings<-read.table(file=site,header=T) >attach(wings) >names(wings) OUTPUT: [1] “MIGRA” “NONMIGRA” > t.test(MIGRA, NONMIGRA) OUTPUT: Welch Two Sample t-test Data: MIGRA and NONMIGRA t= -4.6217, df = 25.614, p-value = 9.422e-05 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -4.046220 -1.553780 sample estimates: mean of x mean of y 82.1 84.9 To interpret the hypothesis test, the test statistic is –4.6217, the Satterthwaite degree of freedom is 25.614, the P-value is 0.00009422, and the sentence underneath, “alternative hypothesis: true difference in means is not equal to 0”, is telling you that they used a two-tailed alternative. My interpretation of the non-directional alternative is: At 5% significance, data provides evidence that there is a difference between the average wing length of migratory and nonmigratory Juncos. To interpret the confidence interval, they tell you your limits of –4.046220 and –1.553780. The interval consists of all negative numbers, it is telling you that “NONMIGRA” is greater than “MIGRA”. My interpretation of the interval would be: At 95% confidence, the average wing length of nonmigratory Juncos is between 1.5538 to 4.0462 millimeters greater than the average wing length of migratory Juncos.. If you wanted to do a one-tailed test, change the coding to: > t.test (MIGRA,NONMIGRA, alt=”less”) > t.test(MIGRA,NONMIGRA, alt=”greater”) for left tailed or: for right tailed If you wanted to change the significance level to anything other than 5%, change the coding to: >t.test(MIGRA,NONMIGRA,conf.level=0.99) for alpha = 1% 1. Do a hypothesis test for the dataset cloud.txt The data set consists of results of a study on cloud seeding with silver nitrate. The variable collected is rainfall amounts, in acre-feet, for unseeded and seeded clouds. At 5% significance, is there evidence that average rainfall for seeded clouds is greater than average rainfall for unseeded clouds? Include your code, output, and interpretation. 2. Do a hypothesis test and confidence interval for the dataset homes.txt The data set consists of random samples of homes from New York and Los Angeles. The variable collected is home prices in thousands of dollars. At 10%, is there evidence that average home price is different in New York than that in Los Angeles? Include your code, output, and interpretations of both a significance test and of the confidence interval Chi-Squared Test for Independence among Categorical Variables The data set for this example is shown in the following table: This example is looking at a sample of 114 people to see if there is evidence that hair color is associated with eye color. Light Eyes 38 14 Light Hair Dark Hair Dark Eyes 11 51 First, you need to get the data into R as a matrix: >hair<-matrix(c(38,14,11,51),nrow=2) >hair output: [,1] [,2] [1,] 38 11 [2,] 14 51 Notice that you enter the data column-wise into the matrix. Now, to do the chi-squared test, enter the following code: We use the option “correct=F”, because we do not want to apply Yate’s continuity correction. We just want to do a Chi-Squared test similar to our homework. >chisq.test(hair, correct=F) Output: Pearson's Chi-squared test data: hair X-squared = 35.3338, df = 1, p-value = 2.778e-09 To interpret the hypothesis test, the test statistic is 35.3338, the degree of freedom is 1, and the p-value is 0.000000002778. Therefore, at 5% significance, there is very strong evidence that hair color is associated with eye color. 3. Conduct a hypothesis test for the following dataset. At 1% significance, is there evidence that being enrolled in a block program at a university is associated with retention? 100 students were broken into two groups of 50 at random. Fifty are in a block program and the others are not. The number of years each student attends the college is then measured. Include your code, output, and interpretation in context. Nonblock Block 1 year 18 10 2 years 15 5 3 years 5 7 4 years 8 18 5+ years 4 10 4. Conduct a hypothesis test for the following dataset. At 1% significance, is there evidence that accident type (non, minor, major) is associated with age of driver? The sample consists of a random sample of drivers who were asked if they had been in an accident in the previous year, and, if so, whether it was a minor or major accident. Include your code, output, and interpretation in context. Under 18 years 18-25 years 26-40 40-65 Over 65 None 67 42 75 56 57 Minor 10 6 8 4 15 Major 8 8 7 9 4 One-Way ANOVA (Analysis of Variance) Analysis of variance is an extension of testing the difference between two independent population means. However, instead of just two population means, you can test for a difference between more than two population means. The null and alternative of a one-way ANOVA test for “I” different population means is: H 0 : 1 2 3 ... I H a : not all of the means are the same The data set for this example is found on my website and is called ozone.txt The data consists of atmospheric ozone concentrations measured in parts per hundred million (pphm) in two commercial lettuce-growing gardens (garden A and garden B) We will do an ANOVA test to see if there is a difference in the average ozone concentrations between the two gardens. > site="http://math.asu.edu/~coombs/ozone.txt" > data<-read.table(file=site, header=T) > attach(data) > names(data) OUTPUT: [1] “ozone” “garden” The code for the ANOVA test is: > summary(aov(ozone~garden)) Df Sum Sq garden 1 20.0000 Residuals 18 24.0000 Mean Sq 20.0000 1.3333 F value Pr(>F) 15 0.001115 ** The output is referred to as an ANOVA table, it reports different statistics like SSG, SSE, MSG, and MSE. But most important, it reports the test statistic of 15 and the P-value of 0.001115. If I interpreted this, I would say that at 5% significance there is evidence that the average ozone concentrations between the gardens are different. 5. Do an ANOVA test on the dataset earnings.txt This dataset consists of independent random samples from workers in five service-producing industries (transportation, wholesale trade, retail trade, finance, services). The variable is the weekly earnings of randomly sampled workers from each of the industries. At 5% significance, is there evidence that the average weekly earnings are different amongst the five industries? Include code, output, and interpretation. 6. Do an ANOVA test of the dataset bone.txt This data set consists of independent samples of femurs of rats that were subjected to two treatments and a control group (three populations). The variable recorded was the bone mineral density in the femur, in grams per square centimeter. One treatment was a control group, one treatment received a low dose of Kudzu, and one treatment received a high dose of Kudzu. Kudzu is a plant from Japan that is believed to have beneficial effects on bones. At 1%, is there evidence that the different treatments have different average bone densities in the rat’s femurs? Include your code, output, and interpretation.
© Copyright 2026 Paperzz