Practical Solutions Comparing Proportions & Analysing Categorical Data Practical Solutions 1. Treatment group (GROUP) and the status at time 2 (CAT2) are nominal categorical variables and as they are not repeated measures of the same outcome we can the Chi-square test or Fisher’s exact test . (Analyze Descriptive Statistics Crosstabs…). The step by step instructions for doing this are contained in the notes but the syntax for this is included below: * Chi-square / Fisher’s test . CROSSTABS /TABLES=GROUP BY CAT2 /FORMAT= AVALUE TABLES /STATISTIC=CHISQ /CELLS= COUNT ROW /COUNT ROUND CELL /METHOD=EXACT TIMER(5). 2 Practical Solutions 1. The SPSS output is included below (the key sections are highlighted): Treatment group * CAT2 Crosstabulation CAT2 .00 Treatment group Active A Active B Placebo Total Count % within Treatment group Count % within Treatment group Count % within Treatment group Count % within Treatment group 1.00 65 78.3% 58 72.5% 53 64.6% 176 71.8% 18 21.7% 22 27.5% 29 35.4% 69 28.2% Total 83 100.0% 80 100.0% 82 100.0% 245 100.0% Chi-Square Tests Pears on Chi-Square Likelihood Ratio Fisher's Exact Test Linear-by-Linear Ass ociation N of Valid Cases Value 3.841 a 3.841 3.796 3.797 Exact Sig. (2-s ided) .156 .156 .153 Exact Sig. (1-s ided) Point Probability 2 2 Asymp. Sig. (2-s ided) .147 .147 1 .051 .057 .031 .010 df b 245 a. 0 cells (.0%) have expected count les s than 5. The minimum expected count is 22.53. b. The s tandardized s tatis tic is 1.949. 3 Practical Solutions 1. The presentation for this type of analysis was mentioned in the notes but you should include the cross-tabulation alongside only the most appropriate p value. As there are no cells with an expected count below 5 here (as shown by the highlighted footer in the SPSS output) we should use the Pearson Chi-square p value. There was found to be no significant association (Pearson Chisquare: p = 0.147) between HbA1c status at follow-up and treatment group. Difference in proportions and association are the same test and the names are often used interchangeably. Therefore as there was no association that means there is no significant difference in proportions. 4 Practical Solutions 2. The selection of specific cases was covered earlier (Data Select Cases…). One way of writing the syntax to select just 2 of the 3 treatment groups is given below: * Filtering the data to select only 2 treatment groups. USE ALL. COMPUTE filter_$=(GROUP=1 OR GROUP=3). VARIABLE LABEL filter_$ 'GROUP=1 OR GROUP=3 (FILTER)'. VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'. FORMAT filter_$ (f1.0). FILTER BY filter_$. EXECUTE . * Re running the Chi-square analysis. CROSSTABS /TABLES=GROUP BY CAT2 /FORMAT= AVALUE TABLES /STATISTIC=CHISQ /CELLS= COUNT ROW /COUNT ROUND CELL. * Turning off the filter . FILTER OFF. USE ALL. EXECUTE . 5 Practical Solutions 2. The SPSS output is included below (the key sections are highlighted): Treatment group * CAT2 Crosstabulation CAT2 .00 Treatment group Active A Placebo Total Count % within Treatment group Count % within Treatment group Count % within Treatment group 65 78.3% 53 64.6% 118 71.5% 1.00 18 21.7% 29 35.4% 47 28.5% Total 83 100.0% 82 100.0% 165 100.0% Chi-Square Tests Pears on Chi-Square Continuity Correctiona Likelihood Ratio Fisher's Exact Test Linear-by-Linear Ass ociation N of Valid Cas es Value 3.789 b 3.147 3.815 3.766 df 1 1 1 1 Asymp. Sig. (2-s ided) .052 .076 .051 Exact Sig. (2-s ided) Exact Sig. (1-s ided) .059 .038 .052 165 a. Computed only for a 2x2 table b. 0 cells (.0%) have expected count les s than 5. The minimum expected count is 23. 36. The Chi-square test is still the most appropriate and although the p value is now very close to 0.05 we still have the same conclusion. 6 Practical Solutions 2. When we produce the 95% confidence interval for the difference in proportions of patients with HbA1c >= 7 you can see that the CI just includes zero at the lower end. This agrees with our borderline significant p value from SPSS earlier. The CI shows that differences in either direction are just about possible (hence so is no difference), but that the difference could be as large as almost 27% (You should present these results in the same way as shown earlier) 7 Practical Solutions 3. We use the same analysis as was used for question 1. The syntax for this is included below: * Computing the Chi-square / Fisher’s exact values . CROSSTABS /TABLES=GROUP BY SevCAT2 /FORMAT= AVALUE TABLES /STATISTIC=CHISQ /CELLS= COUNT ROW /COUNT ROUND CELL /METHOD=EXACT TIMER(5). 8 Practical Solutions 3. The SPSS output is included below (the key sections are highlighted): Treatment group * SevCAT2 Crosstabulation SevCAT2 1.00 79 4 95.2% 4.8% 77 3 96.3% 3.8% 76 6 92.7% 7.3% 232 13 94.7% 5.3% .00 Treatment group Active A Active B Placebo Total Count % within Treatment group Count % within Treatment group Count % within Treatment group Count % within Treatment group Total 83 100.0% 80 100.0% 82 100.0% 245 100.0% Chi-Square Tests Pears on Chi-Square Likelihood Ratio Fisher's Exact Test Linear-by-Linear Ass ociation N of Valid Cases Value 1.085 a 1.061 1.051 .506 b Exact Sig. (2-s ided) .601 .641 .641 Exact Sig. (1-s ided) Point Probability 2 2 Asymp. Sig. (2-s ided) .581 .588 1 .477 .496 .297 .107 df 245 a. 3 cells (50.0%) have expected count les s than 5. The minimum expected count is 4.24. b. The s tandardized s tatis tic is .712. 9 Practical Solutions 3. Again, the presentation for this type of analysis was mentioned in the notes but you should include the cross-tabulation alongside only the most appropriate p value. This time there are 3 cells with an expected count below 5 here (as shown by the highlighted footer in the SPSS output). In this case we should use Fisher’s exact test p value. There was found to be no significant association (Fisher’s exact test: p = 0.641) between severe HbA1c status at follow-up and treatment group. Both tests indicate no significant difference but we should report the Fisher’s exact test result here. This is due to at least one expected count being below 5 and hence the Pearson Chisquare assumption is not met. 10 Practical Solutions 4. This time we are looking at testing one variable against an expected set of proportions. To do this wee need to use the one variable Chi-square (Analyze Nonparametric tests Chisquare..). The key element here is realising that category 0 is < 7 and so is expected to be 60% and category 1 is >=7 and is expected to be 40%, hence we need to enter 0.6 first and then 0.4. Again a step by step guide for this is included in the notes but the syntax has been included below: * Calculating the one variable Chi-square . NPAR TEST /CHISQUARE=CAT1 /EXPECTED= 0.6 0.4 /MISSING ANALYSIS /METHOD=EXACT TIMER(5). 11 Practical Solutions 4. The SPSS output is included below (the key sections are highlighted): CAT1 .00 1.00 Total Test Statistics Chi-Squarea df Asymp. Sig. Exact Sig. Point Probability CAT1 23.282 1 .000 .000 .000 a. 0 cells (.0%) have expected frequencies les s than 5. The minimum expected cell frequency is 98.0. Obs erved N 110 135 245 Expected N 147.0 98.0 Res idual -37.0 37.0 There is highly significant evidence that the sample does not have 60% of patients with a HbA1c <7 and 40% >=7. We can use the Chisquare (Asymp. Sig. value) here as the expected count assumption is met. 12 Practical Solutions 4. From looking at the CIA confidence interval we can see that (with >=7 as the ‘feature’) the confidence interval for the ‘feature’ excludes the test value of 0.4, hence agreeing with our SPSS finding. The CI also indicates that the proportion is higher than 0.4, with the minimum likely value being 48.8% (0.488). 13 Practical Solutions 5. The status variables at time 1 and time 2 (CAT1 & CAT2) are nominal categorical variables that are repeated measures of the same outcome. Due to this we need to use McNemars test to assess if there was a significant change in response (Analyze Descriptive Statistics Crosstabs…). The step by step instructions for doing this are contained in the notes but the syntax for this is included below: * McNemar test . CROSSTABS /TABLES=CAT1 BY CAT2 /FORMAT= AVALUE TABLES /STATISTIC=MCNEMAR /CELLS= COUNT TOTAL /COUNT ROUND CELL . 14 Practical Solutions 5. The SPSS output is included below (the key sections are highlighted): CAT1 * CAT2 Crosstabulation CAT1 .00 1.00 Total Chi-Square Tests Value McNemar Test N of Valid Cases 245 a. Binomial dis tribution us ed. Exact Sig. (2-s ided) .000 a Count % of Total Count % of Total Count % of Total CAT2 .00 1.00 109 1 44.5% .4% 67 68 27.3% 27.8% 176 69 71.8% 28.2% Total 110 44.9% 135 55.1% 245 100.0% It can be seen that the percentages that are changing in each direction are quite different (27.3% and 0.4%), so it is no surprise to see a highly significant McNemar p value (p < 0.001). 15 Practical Solutions 5. When we produce the 95% confidence interval for the difference in proportions of patients changing HbA1c status in each direction it can be seen that the CI is quite a long way from zero. This agrees with our highly significant p value from SPSS earlier. The CI shows that there are quite large differences in favour of a reduction in HbA1c levels, with at least 21.1% more of patients improving than getting worse. (You should present these results in the same way as shown earlier) 16 Practical Solutions 5. Yet again the presentation for this type of analysis was mentioned in the notes but it should include the cross-tabulation, alongside the McNemar p value and a confidence interval for the difference in proportions changing in each direction. There was found to be a highly significant change in HbA1c control status (McNemar test: p < 0.001) between the two measurements in favour of improving control or a lowering of HbA1c levels (Difference 26.9%, 95% CI: 21.1% to 32.5%). 17 Practical Questions 6. We need to enter the data as a summary table in SPSS in the following fashion: 18 Practical Questions 6. Remember that we also need to weight the cases by the count variable: * Weighting the cases . WEIGHT BY Count . * Producing the Kappa . CROSSTABS /TABLES=Rater1 BY Rater2 /FORMAT= AVALUE TABLES /STATISTIC=KAPPA /CELLS= COUNT TOTAL /COUNT ROUND CELL . Having applied the weights we can move on to assess the agreement between the raters. We should use the Kappa technique to do this and step by step instructions were included in the session notes. Syntax for both of these steps is included above. 19 Practical Solutions 6. The SPSS output is included below (the key sections are highlighted): Rater1 * Rater2 Crosstabulation Mild Rater1 Mild Moderate Severe Total Count % of Total Count % of Total Count % of Total Count % of Total 10 20.0% 2 4.0% 0 .0% 12 24.0% Rater2 Moderate 3 6.0% 12 24.0% 1 2.0% 16 32.0% Severe 0 .0% 5 10.0% 17 34.0% 22 44.0% Total 13 26.0% 19 38.0% 18 36.0% 50 100.0% Symmetric Measures Meas ure of Agreement N of Valid Cas es Kappa Value .665 50 Asymp. a Std. Error .088 b Approx. T 6.649 Approx. Sig. .000 The outlined cells indicate agreement between the two raters. The absolute agreement is 78% (20+24+34=78). The Kappa statistic of 0.685 indicates that there is a good level of agreement. a. Not as s uming the null hypothes is. b. Using the as ymptotic s tandard error as suming the null hypothes is. 20 Practical Solutions 6. Using CIA we can get a 95% CI for the Kappa statistic: The CI shows that in the worst case the agreement between the raters may only be 0.491 (or of moderate level). 21
© Copyright 2026 Paperzz