Chi Square Tests page 76 > chisq.test(rbind(res.fair,res.bias)) Pearson’s Chi-squared test data: rbind(res.fair, res.bias) X-squared = 10.7034, df = 5, p-value = 0.05759 Notice the small p-value, but by some standards we still accept the null in this numeric example. If you wish to see some of the intermediate steps you may. The result of the test contains more information than is printed. As an illustration, if we wanted just the expected counts we can ask with the exp value of the test > chisq.test(rbind(res.fair,res.bias))[[’exp’]] 1 2 3 4 5 6 res.fair 33.33333 20 28.66667 34 32.66667 51.33333 res.bias 16.66667 10 14.33333 17 16.33333 25.66667 Problems 12.1 In an effort to increase student retention, many colleges have tried block programs. Suppose 100 students are broken into two groups of 50 at random. One half are in a block program, the other half not. The number of years in attendance is then measured. We wish to test if the block program makes a difference in retention. The data is: Program Non-Block Block 1 yr 18 10 2 yr. 15 5 3 yr 5 7 4yr 8 18 5+ yrs. 4 10 Do a test of hypothesis to decide if there is a difference between the two types of programs in terms of retention. 12.2 A survey of drivers was taken to see if they had been in an accident during the previous year, and if so was it a minor or major accident. The results are tabulated by age group: AGE under 18 18-25 26-40 40-65 over 65 None 67 42 75 56 57 Accident Type minor 10 6 8 4 15 major 5 5 4 6 1 Do a chi-squared hypothesis test of homogeneity to see if there is difference in distributions based on age. 12.3 A fish survey is done to see if the proportion of fish types is consistent with previous years. Suppose, the 3 types of fish recorded: parrotfish, grouper, tang are historically in a 5:3:4 proportion and in a survey the following counts are found observed Parrotfish 53 Type of Fish Grouper 22 Tang 49 Do a test of hypothesis to see if this survey of fish has the same proportions as historically. 12.4 The R dataset UCBAdmissions contains data on admission to UC Berkeley by gender. We wish to investigate if the distribution of males admitted is similar to that of females. To do so, we need to first do some spade work as the data set is presented in a complex contingency table. The ftable (flatten table) command is needed. To use it try > data(UCBAdmissions) > x = ftable(UCBAdmissions) > x Dept A B # read in the dataset # flatten # what is there C D E F Regression Analysis Admit Gender Admitted Male Female Rejected Male Female page 77 512 353 120 138 53 22 89 17 202 131 94 24 313 207 205 279 138 351 19 8 391 244 299 317 We want to compare rows 1 and 2. Treating x as a matrix, we can access these with x[1:2,]. Do a test for homogeneity between the two rows. What do you conclude? Repeat for the rejected group. Section 13: Regression Analysis Regression analysis forms a major part of the statisticians tool box. This section discusses statistical inference for the regression coefficients. Simple linear regression model R can be used to study the linear relationship between two numerical variables. Such a study is called linear regression for historical reasons. The basic model for linear regression is that pairs of data, (xi , yi), are related through the equation y i = β 0 + β 1 xi + i The values of β0 and β1 are unknown and will be estimated from the data. The value of i is the amount the y observation differs from the straight line model. In order to estimate β0 and β1 the method of least squares is employed. That is, one finds the values of (b0 , b1) which make the differences b0 + b1 xi − yi as small as possible (in some sense). To streamline notation define yˆi = b0 + b1 xi and ei = ybi − yi P be the residual amount of difference for the ith observation. Then the method of least squares finds (b0 , b1) to make e2i as small as possible. This mathematical problem can be solved and yields values of P (xi − x̄)(yi − ȳ) P b1 = , ȳ = b0 + b1x̄ (xi − x̄)2 Note the latter says the line goes through the point (x̄, ȳ) and has slope b1 . In order to make statistical inference about these values, one needs to make assumptions about the errors i. The standard assumptions are that these errors are independent, normals with mean 0 and common variance σ 2 . If these assumptions are valid then various statistical tests can be made as will be illustrated below. Example: Linear Regression with R The maximum heart rate of a person is often said to be related to age by the equation Max = 220 − Age. Suppose this is to be empirically proven and 15 people of varying ages are tested for their maximum heart rate. The following data14 is found: Age Max Rate 18 23 25 35 65 54 34 56 72 19 23 42 18 39 37 202 186 187 180 156 169 174 172 153 199 193 174 198 183 178 In a previous section, it was shown how to use lm to do a linear model, and the commands plot and abline to plot the data and the regression line. Recall, this could also be done with the simple.lm function. To review, we can plot the regression line as follows > > > > x = c(18,23,25,35,65,54,34,56,72,19,23,42,18,39,37) y = c(202,186,187,180,156,169,174,172,153,199,193,174,198,183,178) plot(x,y) # make a plot abline(lm(y ~ x)) # plot the regression line 14 This data is simulated, however, the following article suggests a maximum rate of 207 − 0.7(age): “Age-predicted maximal heart rate revisited” Hirofumi Tanaka, Kevin D. Monahan, Douglas R. Seals Journal of the American College of Cardiology, 37:1:153-156.
© Copyright 2024 Paperzz