The Chi-squared test (x ) 2 The Chi-squared test is used to test whether there is a significant difference between data. For example, we can use it to test whether there is any difference between altitude and the type and amount of vegetation. A common test is to see whether there are significant differences in levels of environmental quality between areas. The Chi-squared test can only be used on data which has the following characteristics: 1. 2. 3. 4. The data must be in the form of frequencies counted in a number of groups. Data must be on the interval or ratio scale (i.e. it has a precise numerical value) and can be grouped into categories. The total number of observations must be greater than twenty. The expected frequency in any one category must be greater than five. Method 1. State the hypothesis being tested – there is a significant difference between two or more sample groups. It is convention to give a null hypothesis, (a negative test) that is, that there is no significant difference between the samples. 2. Tabulate the data as shown in the example below. The data being tested for significance is known as the ‘observed’ frequency, and the column is headed ‘O’. 3. Calculate the ‘expected’ number of frequencies that you would expect to find. These go in column ‘E’. 4. Calculate the Chi-squared statistic using the formula: x 2 = Σ (O − E )2 E 2 where x is the Chi-squared statistic, Σ is the sum of O refers to the observed frequencies, and E are the expected frequencies. 5. Calculate the degrees of freedom. This is quite simply one less than the total number of observations (N), i.e. N – 1. 6. Compare the calculated figure with the critical values in the significance tables using the appropriate degrees of freedom. Read off the probability that the data frequencies you are testing could have occurred by chance. © Pearson Education Ltd 2012. For more information about the Pearson Baccalaureate series please visit www.pearsonbacconline.com. Example A survey asked students what they thought was the world’s most serious problem. Fifty students in each of five year groups were asked. The number from each group that stated ‘global warming’ is shown below. Number stating ‘global warming’ as the most serious problem the world is facing Year 13 Year 12 Year 11 Year 10 Year 9 Total 16 24 40 26 30 145 1. State the hypothesis being tested – there is a significant difference in the number of students in each year group stating that global warming is the most serious problem. It is convention to give a null hypothesis, (a negative test) that is that there is no significant difference in the number of students in each year group stating that global warming is the most serious problem. 2. To work out the expected, find the average number of students stating ‘global warming’ for the five year groups. If there is no significant difference between them they should all have around the same. In this case, the total of all observations comes to 145 and the mean is therefore 29. This becomes the expected value. 3. Tabulate the data, thus Year Obs. Exp. (O–E) (O–E)2 (O–E)2/E Year 13 Year 12 Year 11 Year 10 Year 9 16 24 40 26 39 29 29 29 29 29 -13 -5 11 -3 10 169 25 121 9 100 5.83 0.86 4.17 0.31 3.45 ∑ = 14.62 4. Degrees of freedom (df) = (N – 1) = (5 – 1) = 4 5. The critical values for 4 df are: 0.05 0.01 9.49 13.28 Clearly the computed value of 14.62 is higher than the critical values even at the 0.01 (99%) level of significance. This means that our computed value is statistically significant. Therefore we reject the null hypothesis and we accept the alternative hypothesis. This means that there is a significant difference in the number of students in different age groups who thought global warming was the world’s most serious problem. NB The next stage is to offer explanations for the results. Remember the statistic is only used as a means of clarification: it is not an end in itself but a means to help you to explain. © Pearson Education Ltd 2012. For more information about the Pearson Baccalaureate series please visit www.pearsonbacconline.com. The critical values show the probability that the calculated value of x2 is the result of a chance distribution. The larger the value of x2 the smaller is the probability that the null hypothesis is correct. df 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 95% 3.84 5.99 7.82 9.49 11.07 12.59 14.07 15.51 16.92 18.31 19.68 21.03 22.36 23.68 25.00 26.30 27.59 28.87 30.14 31.41 99% 6.64 9.21 11.34 13.28 15.09 16.81 18.48 20.09 21.67 23.21 24.72 26.22 27.69 29.14 30.58 32.00 33.41 34.80 36.19 37.57 Exercise Some students completed an environmental quality index for eight parts of an urban area. Their results are shown below. Site EQI (Obs) Wolvercote 42 Cornmarket 36 St Ebbe's 25 Summertown 40 Cowley 28 Botley 36 St Clements 21 Osney 36 1. 2. 3. State the hypothesis being tested Work out the x2 statistic. Assess the level of statistical significance from the data. © Pearson Education Ltd 2012. For more information about the Pearson Baccalaureate series please visit www.pearsonbacconline.com. Answers 1. There is a significant difference in the environmental quality index in the selected areas. It is convention to give a null hypothesis, (a negative test) that is that there is no significant difference in the environmental quality index in the selected areas. 2. To work out the expected, find the average environmental quality index for the eight areas. If there is no significant difference between them they should all have around the same. In this case, the total of all observations comes to 264 and the mean is therefore 33. This becomes the expected value. 3. Tabulate the data, thus Site EQI (Obs) Exp. (O–E) (O–E)2 (O–E)2/E Wolvercote 42 33 9 81 2.45 Cornmarket 36 33 3 9 0.27 St Ebbe's 25 33 -8 64 1.94 Summertown 40 33 7 49 1.48 Cowley 28 33 -5 25 0.76 Botley 36 33 3 9 0.27 St Clements 21 33 -12 144 4.36 Osney 36 33 3 9 0.27 ∑ = 11.82 4. Degrees of freedom (df) = (N – 1) = (8 – 1) = 7 5. The critical values for 7 df are: 0.05 0.01 14.07 18.48 Clearly the computed value of 11.82 is lower than the critical values even at the 0.05 (95%) level of significance. This means that our computed value is not statistically significant even though there are some variations in environmental quality index between the eight locations. Therefore we cannot reject the null hypothesis nor can we accept the alternative hypothesis. This means that there is a not significant difference in the environmental quality index between the eight locations. © Pearson Education Ltd 2012. For more information about the Pearson Baccalaureate series please visit www.pearsonbacconline.com.
© Copyright 2026 Paperzz