Main presentation title goes here.

Practical Solutions
Comparing Proportions & Analysing Categorical Data
Practical Solutions
1. Treatment group (GROUP) and the status at time 2 (CAT2) are
nominal categorical variables and as they are not repeated
measures of the same outcome we can the Chi-square test or
Fisher’s exact test . (Analyze  Descriptive Statistics 
Crosstabs…). The step by step instructions for doing this are
contained in the notes but the syntax for this is included below:
* Chi-square / Fisher’s test .
CROSSTABS
/TABLES=GROUP BY CAT2
/FORMAT= AVALUE TABLES
/STATISTIC=CHISQ
/CELLS= COUNT ROW
/COUNT ROUND CELL
/METHOD=EXACT TIMER(5).
2
Practical Solutions
1. The SPSS output is included below (the key sections
are highlighted):
Treatment group * CAT2 Crosstabulation
CAT2
.00
Treatment
group
Active A
Active B
Placebo
Total
Count
% within Treatment group
Count
% within Treatment group
Count
% within Treatment group
Count
% within Treatment group
1.00
65
78.3%
58
72.5%
53
64.6%
176
71.8%
18
21.7%
22
27.5%
29
35.4%
69
28.2%
Total
83
100.0%
80
100.0%
82
100.0%
245
100.0%
Chi-Square Tests
Pears on Chi-Square
Likelihood Ratio
Fisher's Exact Test
Linear-by-Linear
Ass ociation
N of Valid Cases
Value
3.841 a
3.841
3.796
3.797
Exact Sig.
(2-s ided)
.156
.156
.153
Exact Sig.
(1-s ided)
Point
Probability
2
2
Asymp. Sig.
(2-s ided)
.147
.147
1
.051
.057
.031
.010
df
b
245
a. 0 cells (.0%) have expected count les s than 5. The minimum expected count is 22.53.
b. The s tandardized s tatis tic is 1.949.
3
Practical Solutions
1. The presentation for this type of analysis was mentioned in the
notes but you should include the cross-tabulation alongside
only the most appropriate p value. As there are no cells with an
expected count below 5 here (as shown by the highlighted
footer in the SPSS output) we should use the Pearson Chi-square
p value.
There was found to be no significant association (Pearson Chisquare: p = 0.147) between HbA1c status at follow-up and
treatment group.
Difference in proportions and association are the same test and
the names are often used interchangeably. Therefore as there
was no association that means there is no significant difference
in proportions.
4
Practical Solutions
2. The selection of specific cases was covered earlier
(Data  Select Cases…). One way of writing the
syntax to select just 2 of the 3 treatment groups is
given below:
* Filtering the data to select only 2 treatment groups.
USE ALL.
COMPUTE filter_$=(GROUP=1 OR GROUP=3).
VARIABLE LABEL filter_$ 'GROUP=1 OR GROUP=3 (FILTER)'.
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMAT filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE .
* Re running the Chi-square analysis.
CROSSTABS
/TABLES=GROUP BY CAT2
/FORMAT= AVALUE TABLES
/STATISTIC=CHISQ
/CELLS= COUNT ROW
/COUNT ROUND CELL.
* Turning off the filter .
FILTER OFF.
USE ALL.
EXECUTE .
5
Practical Solutions
2. The SPSS output is included below (the key sections
are highlighted):
Treatment group * CAT2 Crosstabulation
CAT2
.00
Treatment
group
Active A
Placebo
Total
Count
% within Treatment group
Count
% within Treatment group
Count
% within Treatment group
65
78.3%
53
64.6%
118
71.5%
1.00
18
21.7%
29
35.4%
47
28.5%
Total
83
100.0%
82
100.0%
165
100.0%
Chi-Square Tests
Pears on Chi-Square
Continuity Correctiona
Likelihood Ratio
Fisher's Exact Test
Linear-by-Linear
Ass ociation
N of Valid Cas es
Value
3.789 b
3.147
3.815
3.766
df
1
1
1
1
Asymp. Sig.
(2-s ided)
.052
.076
.051
Exact Sig.
(2-s ided)
Exact Sig.
(1-s ided)
.059
.038
.052
165
a. Computed only for a 2x2 table
b. 0 cells (.0%) have expected count les s than 5. The minimum expected count is 23.
36.
The Chi-square test is still
the most appropriate and
although the p value is
now very close to 0.05 we
still have the same
conclusion.
6
Practical Solutions
2.
When we produce the 95% confidence interval for the difference in proportions
of patients with HbA1c >= 7 you can see that the CI just includes zero at the
lower end. This agrees with our borderline significant p value from SPSS earlier.
The CI shows that
differences in either
direction are just
about possible (hence
so is no difference),
but that the difference
could be as large as
almost 27%
(You should present
these results in the
same way as shown
earlier)
7
Practical Solutions
3. We use the same analysis as was used for question 1.
The syntax for this is included below:
* Computing the Chi-square / Fisher’s exact values .
CROSSTABS
/TABLES=GROUP BY SevCAT2
/FORMAT= AVALUE TABLES
/STATISTIC=CHISQ
/CELLS= COUNT ROW
/COUNT ROUND CELL
/METHOD=EXACT TIMER(5).
8
Practical Solutions
3. The SPSS output is included below (the key sections
are highlighted):
Treatment group * SevCAT2 Crosstabulation
SevCAT2
1.00
79
4
95.2%
4.8%
77
3
96.3%
3.8%
76
6
92.7%
7.3%
232
13
94.7%
5.3%
.00
Treatment
group
Active A
Active B
Placebo
Total
Count
% within Treatment group
Count
% within Treatment group
Count
% within Treatment group
Count
% within Treatment group
Total
83
100.0%
80
100.0%
82
100.0%
245
100.0%
Chi-Square Tests
Pears on Chi-Square
Likelihood Ratio
Fisher's Exact Test
Linear-by-Linear
Ass ociation
N of Valid Cases
Value
1.085 a
1.061
1.051
.506
b
Exact Sig.
(2-s ided)
.601
.641
.641
Exact Sig.
(1-s ided)
Point
Probability
2
2
Asymp. Sig.
(2-s ided)
.581
.588
1
.477
.496
.297
.107
df
245
a. 3 cells (50.0%) have expected count les s than 5. The minimum expected count is 4.24.
b. The s tandardized s tatis tic is .712.
9
Practical Solutions
3. Again, the presentation for this type of analysis was mentioned
in the notes but you should include the cross-tabulation
alongside only the most appropriate p value. This time there are
3 cells with an expected count below 5 here (as shown by the
highlighted footer in the SPSS output). In this case we should
use Fisher’s exact test p value.
There was found to be no significant association (Fisher’s exact
test: p = 0.641) between severe HbA1c status at follow-up and
treatment group.
Both tests indicate no significant difference but we should
report the Fisher’s exact test result here. This is due to at least
one expected count being below 5 and hence the Pearson Chisquare assumption is not met.
10
Practical Solutions
4. This time we are looking at testing one variable against an
expected set of proportions. To do this wee need to use the one
variable Chi-square (Analyze  Nonparametric tests  Chisquare..). The key element here is realising that category 0 is <
7 and so is expected to be 60% and category 1 is >=7 and is
expected to be 40%, hence we need to enter 0.6 first and then
0.4. Again a step by step guide for this is included in the notes
but the syntax has been included below:
* Calculating the one variable Chi-square .
NPAR TEST
/CHISQUARE=CAT1
/EXPECTED= 0.6 0.4
/MISSING ANALYSIS
/METHOD=EXACT TIMER(5).
11
Practical Solutions
4. The SPSS output is included below (the key sections
are highlighted):
CAT1
.00
1.00
Total
Test Statistics
Chi-Squarea
df
Asymp. Sig.
Exact Sig.
Point Probability
CAT1
23.282
1
.000
.000
.000
a. 0 cells (.0%) have expected frequencies les s than
5. The minimum expected cell frequency is 98.0.
Obs erved N
110
135
245
Expected N
147.0
98.0
Res idual
-37.0
37.0
There is highly significant evidence
that the sample does not have 60%
of patients with a HbA1c <7 and
40% >=7. We can use the Chisquare (Asymp. Sig. value) here
as the expected count assumption
is met.
12
Practical Solutions
4. From looking at the CIA confidence interval we can see that
(with >=7 as the ‘feature’) the confidence interval for the
‘feature’ excludes the test value of 0.4, hence agreeing with our
SPSS finding.
The CI also
indicates that the
proportion is
higher than 0.4,
with the minimum
likely value being
48.8% (0.488).
13
Practical Solutions
5. The status variables at time 1 and time 2 (CAT1 & CAT2) are
nominal categorical variables that are repeated measures of the
same outcome. Due to this we need to use McNemars test to
assess if there was a significant change in response (Analyze 
Descriptive Statistics  Crosstabs…). The step by step
instructions for doing this are contained in the notes but the
syntax for this is included below:
* McNemar test .
CROSSTABS
/TABLES=CAT1 BY CAT2
/FORMAT= AVALUE TABLES
/STATISTIC=MCNEMAR
/CELLS= COUNT TOTAL
/COUNT ROUND CELL .
14
Practical Solutions
5. The SPSS output is included below (the key sections
are highlighted):
CAT1 * CAT2 Crosstabulation
CAT1
.00
1.00
Total
Chi-Square Tests
Value
McNemar Test
N of Valid Cases
245
a. Binomial dis tribution us ed.
Exact Sig.
(2-s ided)
.000 a
Count
% of Total
Count
% of Total
Count
% of Total
CAT2
.00
1.00
109
1
44.5%
.4%
67
68
27.3%
27.8%
176
69
71.8%
28.2%
Total
110
44.9%
135
55.1%
245
100.0%
It can be seen that the percentages that
are changing in each direction are quite
different (27.3% and 0.4%), so it is no
surprise to see a highly significant
McNemar p value (p < 0.001).
15
Practical Solutions
5.
When we produce the 95% confidence interval for the difference in
proportions of patients changing HbA1c status in each direction it can
be seen that the CI is quite a long way from zero. This agrees with our
highly significant p value from SPSS earlier.
The CI shows that there
are quite large differences
in favour of a reduction in
HbA1c levels, with at least
21.1% more of patients
improving than getting
worse.
(You should present
these results in the
same way as shown
earlier)
16
Practical Solutions
5. Yet again the presentation for this type of analysis
was mentioned in the notes but it should include the
cross-tabulation, alongside the McNemar p value and
a confidence interval for the difference in
proportions changing in each direction.
There was found to be a highly significant change in
HbA1c control status (McNemar test: p < 0.001)
between the two measurements in favour of
improving control or a lowering of HbA1c levels
(Difference 26.9%, 95% CI: 21.1% to 32.5%).
17
Practical Questions
6. We need to enter the data as a summary table in SPSS
in the following fashion:
18
Practical Questions
6.
Remember that we also need to weight the cases by the count variable:
* Weighting the cases .
WEIGHT
BY Count .
* Producing the Kappa .
CROSSTABS
/TABLES=Rater1 BY Rater2
/FORMAT= AVALUE TABLES
/STATISTIC=KAPPA
/CELLS= COUNT TOTAL
/COUNT ROUND CELL .
Having applied the weights we can move on to assess the agreement
between the raters. We should use the Kappa technique to do this and
step by step instructions were included in the session notes. Syntax for
both of these steps is included above.
19
Practical Solutions
6. The SPSS output is included below (the key sections
are highlighted):
Rater1 * Rater2 Crosstabulation
Mild
Rater1
Mild
Moderate
Severe
Total
Count
% of Total
Count
% of Total
Count
% of Total
Count
% of Total
10
20.0%
2
4.0%
0
.0%
12
24.0%
Rater2
Moderate
3
6.0%
12
24.0%
1
2.0%
16
32.0%
Severe
0
.0%
5
10.0%
17
34.0%
22
44.0%
Total
13
26.0%
19
38.0%
18
36.0%
50
100.0%
Symmetric Measures
Meas ure of Agreement
N of Valid Cas es
Kappa
Value
.665
50
Asymp.
a
Std. Error
.088
b
Approx. T
6.649
Approx. Sig.
.000
The outlined cells
indicate agreement
between the two
raters. The absolute
agreement is 78%
(20+24+34=78). The
Kappa statistic of
0.685 indicates that
there is a good level
of agreement.
a. Not as s uming the null hypothes is.
b. Using the as ymptotic s tandard error as suming the null hypothes is.
20
Practical Solutions
6. Using CIA we can get a 95% CI for the Kappa statistic:
The CI shows
that in the worst
case the
agreement
between the
raters may only
be 0.491 (or of
moderate level).
21