C8552 Discovering Statistics 2014-15 Sample Paper

Candidate Number
C8552
THE UNIVERSITY OF SUSSEX
BSc Second Year Examination
DISCOVERING STATISTICS
SAMPLE PAPER
DO NOT TURN OVER UNTIL INSTRUCTED
TO BY THE CHIEF INVIGILATOR
INSTRUCTIONS
Do not, under any circumstances, remove the question paper, used or unused,
from the examination room; it will be collected before you may leave.
Time allowed: 2 Hours
Answer ALL questions in the answer book provided
Please note: The only approved calculators for use in University examinations
are the Casio fx82, fx83, fx85, fx115, fx570 and fx991 (all with any suffix).
Students are not permitted to take instruction notes or booklets relating to their
calculator into an examination.
C8552 Discovering Statistics
1. A record company executive was interested in the effects of subliminal messages
on records having had many of his artists sued for allegedly having evil messages
on their records (e.g. Ozzy Osbourne) that incited daft people to do stupid things
like kill themselves. So, he took a record by Westlife and inserted different types
of subliminal message onto different versions: (1) a control record didn’t have any
message (no message); (2) a second record had a friendly message that said
‘Be happy and at one with your being’ (Friendly); (3) the third had the satanic
message ‘surrender your soul to beelzibub’ (Satan); (4) the fourth had another
satanic message that instructed them to do a violent act ‘Surrender your soul to
the dark lord and sacrifice some goats while you’re at it’(Goats); and (5) a final
record had the same satanic message about goats but it was played backwards
(Backwards). He played the different types of record to different groups of
teenagers. The outcome that the executive measured was the number of goats
that each listener sacrificed. The SPSS Output is reproduced after the question.
(a) What does Cohen’s d represent? Compute and interpret Cohen’s d for the
difference in the number of goat’s sacrificed in the Satan group and the No
Message group [5 marks]
(b) There are some numbers missing from the ANOVA summary table.
Calculate these three values (residual sum of squares and mean squares
and the F-ratio). [3 marks].
(c) Is the assumption of homogeneity of variance met? [3 marks]
(d) What conclusions could we make about the effects of subliminal messages
on records? [2 marks]
(e) The executive made 3 predictions: (1) having no message, or a friendly
message, would have less effect than having some kind of satanic
message; (2) the backward satanic message would have more impact than
the two non-backward messages; (3) the satanic message that specifically
told people to kill goats would have more effect than the satanic message
that did not. Suggest some planned contrasts (with the appropriate group
codings) that could be done to test these hypotheses. [4 marks]
(f) What do you understand by the term ‘mean squares’ (i.e. conceptually
speaking what does the ‘mean squares’ in an ANOVA table represent)? [3
marks]
95% Confidence Interval for
Mean
N
Mean
Std. Deviation
Std. Error
Lower Bound
Upper Bound
No Message
7
15.71
2.690
1.017
13.23
18.20
Friendly
8
12.13
2.588
.915
9.96
14.29
Satan
6
9.33
1.033
.422
8.25
10.42
Goats
8
8.13
3.441
1.217
5.25
11.00
Backward
7
11.86
3.237
1.223
8.86
14.85
Total
36
11.42
3.737
.623
10.15
12.68
2
/Turn over
C8552 Discovering Statistics
Number of Goats Sacrificed
Levene Statistic
df1
df2
Sig.
1.801
4
31
.154
ANOVA
Number of Goats Sacrificed
Sum of Squares
df
Mean Square
247.381
4
61.845
Between Groups
Within Groups
F
Sig.
.000
31
Total
488.750
35
Robust Tests of Equality of Means
Number of Goats Sacrificed
a
Statistic
df1
df2
Sig.
Welch
9.289
4
14.917
.001
Brown-Forsythe
8.364
4
25.969
.000
a. Asymptotically F distributed.
2. An experiment was done to look at the positive arousing effects of imagery on
different people. A sample of statistics lecturers was compared against a group of
students. Both groups received presentations of positive images (e.g. cats and
bunnies), neutral images (e.g. duvets and lightbulbs), and negative images (e.g.
corpses and vivisection photographs). Positive arousal was measured
physiologically (high values indicate positive arousal) both before and after each
batch of images. The order in which participants saw the batches of positive,
neutral and negative images was randomised to avoid order effects. It was
hypothesised that positive images would increase positive arousal, negative
images would reduce positive arousal and that neutral images would have no
effect. Differences between the participant groups (lecturers and students) were
not expected. The SPSS Output is reproduced after the question.
(a) What type of analysis has been carried out (briefly describe the design in
answering this question)? [2 marks]
(b) With reference to the current experiment, what are the relative pros and
cons of repeated measures experimental designs compared to
independent (aka between-group) ones? [5 marks]
(c) Are any assumptions broken and if so what impact does that have? [3
marks]
3
/Turn over
C8552 Discovering Statistics
(d) Interpret the output in full: do students and statistics lecturers differ in the
type of stimuli that arouse them? Are statistics lecturers more aroused than
students in general? Do the images vary in the degree to which they affect
physiological arousal? [10 marks]
Descriptive Statisti cs
Arous al Bef ore Positiv e I magery
Arous al Bef ore Neut ral Imagery
Arous al Bef ore Negativ e Imagery
Arous al Af ter Positiv e I magery
Arous al Af ter Neut ral I magery
Arous al Af ter Negativ e Imagery
Group
Stat is tics
Students
Tot al
Stat is tics
Students
Tot al
Stat is tics
Students
Tot al
Stat is tics
Students
Tot al
Stat is tics
Students
Tot al
Stat is tics
Students
Tot al
Mean
3. 6406
1. 9049
2. 7728
4. 7012
1. 9563
3. 3288
4. 3077
1. 9226
3. 1151
14. 5436
13. 7746
14. 1591
3. 8363
2. 5668
3. 2015
12. 7004
-9.7949
1. 4528
Lect urers
Lect urers
Lect urers
Lect urers
Lect urers
Lect urers
Std. Dev iation
5. 95578
7. 19965
6. 49218
8. 53801
7. 88769
8. 12304
2. 19003
9. 39252
6. 74959
6. 48193
6. 94780
6. 55159
7. 32800
7. 29281
7. 14519
6. 19516
6. 63192
13. 12183
N
10
10
20
10
10
20
10
10
20
10
10
20
10
10
20
10
10
20
b
Mauchly's Test of Sphericity
Measure: MEASUR E_1
a
Epsilon
Within Subject s Ef f ect Mauchly 's W
TI ME
1. 000
IMAGERY
.985
TI ME * IMAGERY
.885
Approx.
Chi-Square
.000
.260
2. 076
df
Sig.
0
2
2
.
.878
.354
Greenhous
e-Geisser
1. 000
.985
.897
Huy nh-Feldt
1. 000
1. 000
1. 000
Lower-bound
1. 000
.500
.500
Tes ts the null hy pothes is that t he error c ov arianc e matrix of t he orthonormalized transf ormed dependent v ariables is
proportional to an identity mat rix.
a. May be us ed t o adjust the degrees of f reedom f or the av eraged t est s of signif icanc e. C orrected tests are display ed in t he
Tes ts of Within-Subject s Ef f ects table.
b.
Des ign: Intercept+GROUP
Within Subject s Design: TIME+I MAGERY+TIME*IMAGERY
4
/Turn over
C8552 Discovering Statistics
Tests of Within-Subj ects Effects
Measure: MEASURE_1
Sourc e
TI ME
Sphericity Assumed
Greenhouse-Geisser
Huy nh-Feldt
Lower-bound
TI ME * GROUP
Sphericity Assumed
Greenhouse-Geisser
Huy nh-Feldt
Lower-bound
Error(TI ME)
Sphericity Assumed
Greenhouse-Geisser
Huy nh-Feldt
Lower-bound
IMAGERY
Sphericity Assumed
Greenhouse-Geisser
Huy nh-Feldt
Lower-bound
IMAGERY * GROUP Sphericity Assumed
Greenhouse-Geisser
Huy nh-Feldt
Lower-bound
Error(I MAGERY )
Sphericity Assumed
Greenhouse-Geisser
Huy nh-Feldt
Lower-bound
TI ME * IMAGERY
Sphericity Assumed
Greenhouse-Geisser
Huy nh-Feldt
Lower-bound
TI ME * IMAGERY *
Sphericity Assumed
GROUP
Greenhouse-Geisser
Huy nh-Feldt
Lower-bound
Error(TI ME*I MAGERY ) Sphericity Assumed
Greenhouse-Geisser
Huy nh-Feldt
Lower-bound
Ty pe III Sum
of Squares
306.989
306.989
306.989
306.989
260.139
260.139
260.139
260.139
729.057
729.057
729.057
729.057
883.031
883.031
883.031
883.031
781.951
781.951
781.951
781.951
2125.687
2125.687
2125.687
2125.687
1017.291
1017.291
1017.291
1017.291
758.698
758.698
758.698
758.698
1697.758
1697.758
1697.758
1697.758
df
1
1. 000
1. 000
1. 000
1
1. 000
1. 000
1. 000
18
18. 000
18. 000
18. 000
2
1. 970
2. 000
1. 000
2
1. 970
2. 000
1. 000
36
35. 462
36. 000
18. 000
2
1. 794
2. 000
1. 000
2
1. 794
2. 000
1. 000
36
32. 289
36. 000
18. 000
Mean Square
306.989
306.989
306.989
306.989
260.139
260.139
260.139
260.139
40. 503
40. 503
40. 503
40. 503
441.515
448.220
441.515
883.031
390.975
396.912
390.975
781.951
59. 047
59. 943
59. 047
118.094
508.646
567.112
508.646
1017.291
379.349
422.953
379.349
758.698
47. 160
52. 581
47. 160
94. 320
F
7. 579
7. 579
7. 579
7. 579
6. 423
6. 423
6. 423
6. 423
Sig.
.013
.013
.013
.013
.021
.021
.021
.021
7. 477
7. 477
7. 477
7. 477
6. 621
6. 621
6. 621
6. 621
.002
.002
.002
.014
.004
.004
.004
.019
10. 786
10. 786
10. 786
10. 786
8. 044
8. 044
8. 044
8. 044
.000
.000
.000
.004
.001
.002
.001
.011
a
Levene's Test of Equal ity of Error Variances
Arous al
Arous al
Arous al
Arous al
Arous al
Arous al
Bef ore Positiv e I magery
Bef ore Neut ral I magery
Bef ore Negativ e Imagery
Af ter Positiv e Imagery
Af ter Neut ral I magery
Af ter Negat iv e Imagery
F
.597
.140
10. 670
.169
.017
.003
df 1
df 2
1
1
1
1
1
1
18
18
18
18
18
18
Sig.
.450
.713
.004
.686
.898
.954
Tes ts the null hy pot hesis that t he error v ariance of the dependent v ariable is equal across
groups.
a.
Des ign: Intercept+GROU P
Within Subject s Des ign: TIME+I MAGERY +TIME*IMAGER Y
5
/Turn over
C8552 Discovering Statistics
Tests of Between-Subjects Effects
Measure: MEASURE_1
Transf ormed Variable: Av erage
Sourc e
Interc ept
GROUP
Error
Ty pe III Sum
of Squares
2618.947
821.611
802.225
df
1
1
18
Mean Square
2618.947
821.611
44. 568
F
58. 763
18. 435
Sig.
.000
.000
6
Mean Arousal
5
4
3
2
1
0
Before Imagery
After Imagery
Time
Figure 1: Mean arousal before and after imagery
6
/Turn over
C8552 Discovering Statistics
Mean Arousal
6
4
2
0
Statistics Lecturers
Students
Group
Figure 2: Mean arousal for statistics lecturers and students
Mean Arousal
8
6
4
2
0
Negative
Neutral
Positive
Type of Imagery
Figure 3: Mean arousal for different types of imagery
7
/Turn over
C8552 Discovering Statistics
●
Mean Arousal
10
8
Group
● Statistics Lecturers
Students
6
●
4
2
Before Imagery After Imagery
Time
Figure 4: Mean arousal before and after imagery in statistics lecturers and students
●
Mean Arousal
●
5
●
Group
● Statistics Lecturers
Students
0
Negative
Neutral
Positive
Time
Figure 5: Mean arousal after different types of imagery in statistics lecturers and students
8
/Turn over
C8552 Discovering Statistics
14
Mean Arousal
12
10
Imagery
● Negative
Neutral
Positive
8
6
4
●
2
●
Before Imagery
After Imagery
Time
Figure 6: Mean arousal before and after different types of imagery
Statistics Lecturers
Students
15
●
Mean Arousal
10
5
Imagery
● Negative
Neutral
Positive
●
●
0
−5
−10
●
Before Imagery After Imagery
Before Imagery After Imagery
Time
Figure 7: Mean arousal before and after different types of imagery in statistics lecturers and students
9
/Turn over
C8552 Discovering Statistics
3. A study was carried out to explore the relationship between aggression and
several potential predicting factors in 300 children that had an older sibling.
Variables measured were Parenting Style (high score = strict, low score =
liberal), Computer Games (high score = more time spent playing computer
games), Television (high score = more time spent watching television), Enumbers (high score = more e-numbers in the child’s diet), and Sibling
Aggression (high score = more aggression seen in their older sibling). The SPSS
Output is reproduced after the question.
(a) What is a bootstrap confidence interval and when would you use one? [3
marks]
(b) What factors predict aggression and which do not (quote the relevant
statistics)? Which is the most substantial predictor? [6 marks]
(c) The R2 statistic is the squared correlation coefficient between which two
variables? How would you interpret the four values of R2 in this output? [4
marks]
(d) What assumption does the Durbin-Watson statistic help us to assess?
Describe what you understand the assumption to mean and whether it has
been met in these data. [3 marks]
(e) What assumption does the scatterplot in this output assess? Describe what
you understand the assumption to mean and whether it has been met in
these data. [4 marks]
Variables Entered/Removed
Model
a
Variables Entered
1
Sibling Aggression, Parenting Style
2
Computer Games
3
E-Numbers
4
b
Variables Removed
Method
.
Enter
.
Enter
.
Enter
.
Enter
b
b
b
Television
a. Dependent Variable: Aggression
b. All requested variables entered.
e
Model Summary
Change Statistics
Model
R
1
.667
.906
2
3
4
R
Square
Adjusted
R Square
Std. Error of
the Estimate
R Square
Change
F
Change
df1
df2
Sig. F
Change
a
.444
.441
8.57290
.444
118.815
2
297
.000
b
.820
.818
4.88508
.376
618.680
1
296
.000
c
.944
.943
2.73580
.124
648.769
1
295
.000
d
.944
.943
2.74042
.000
.005
1
294
.941
.971
.971
a. Predictors: (Constant), Sibling Aggression, Parenting Style
b. Predictors: (Constant), Sibling Aggression, Parenting Style, Computer Games
c. Predictors: (Constant), Sibling Aggression, Parenting Style, Computer Games, E-Numbers
d. Predictors: (Constant), Sibling Aggression, Parenting Style, Computer Games, E-Numbers, Television
e. Dependent Variable: Aggression
10
/Turn over
DurbinWatson
1.981
C8552 Discovering Statistics
a
ANOVA
Model
1
2
3
4
Sum of Squares
df
Mean Square
F
Sig.
Regression
17464.455
2
8732.227
118.815
.000
Residual
21827.892
297
73.495
Total
39292.347
299
Regression
32228.612
3
10742.871
450.171
.000
Residual
7063.734
296
23.864
Total
39292.347
299
Regression
37084.389
4
9271.097
1238.689
.000
d
Residual
2207.958
295
7.485
Total
39292.347
299
Regression
37084.430
5
7416.886
987.612
.000
e
Residual
2207.917
294
7.510
Total
39292.347
299
a. Dependent Variable: Aggression
b. Predictors: (Constant), Parenting Style, Sibling Aggression
c. Predictors: (Constant), Parenting Style, Sibling Aggression, Computer Games
d. Predictors: (Constant), Parenting Style, Sibling Aggression, Computer Games, E-Numbers
e. Predictors: (Constant), Parenting Style, Sibling Aggression, Computer Games, E-Numbers, Television
11
/Turn over
b
c
C8552 Discovering Statistics
Coefficients
Unstandardized
Coefficients
B
Std.
Error
44.809
3.353
.278
.034
Parenting Style
-1.137
.154
(Constant)
61.898
2.031
Sig.
Lower
Bound
Upper
Bound
13.362
.000
38.210
51.408
.408
8.263
.000
.211
-.365
-7.395
.000
30.482
Sibling Aggression
-.165
.026
-.243
Parenting Style
-3.792
.138
Computer Games
2.173
.087
(Constant)
64.246
1.141
Sibling Aggression
-.298
.016
Parenting Style
-4.346
Computer Games
VIF
.344
.766
1.305
-1.440
-.834
.766
1.305
.000
57.902
65.894
-6.323
.000
-.217
-.114
.411
2.434
-1.218
-27.460
.000
-4.063
-3.520
.308
3.242
.995
24.873
.000
2.001
2.345
.379
2.636
56.310
.000
62.001
66.491
-.439
-19.197
.000
-.329
-.268
.364
2.745
3
.080
-1.397
-54.098
.000
-4.504
-4.188
.286
3.499
2.489
.050
1.140
49.313
.000
2.390
2.588
.356
2.805
E-Numbers
.151
.006
.373
25.471
.000
.139
.162
.887
1.128
(Constant)
64.029
3.148
20.342
.000
57.835
70.224
Sibling Aggression
-.299
.016
-.439
-18.698
.000
-.330
-.267
.346
2.888
Parenting Style
-4.347
.082
-1.397
-52.769
.000
-4.509
-4.185
.273
3.667
Computer Games
2.488
.053
1.139
46.817
.000
2.383
2.593
.323
3.099
E-Numbers
.151
.006
.373
25.353
.000
.139
.162
.882
1.134
Television
.011
.151
.001
.074
.941
-.286
.308
.552
1.813
Sibling Aggression
Beta
Collinearity
Statistics
Tolerance
(Constant)
2
95.0%
Confidence
Interval for B
Standardized
Coefficients
t
Model
1
a
4
a. Dependent Variable: Aggression
12
/Turn over
C8552 Discovering Statistics
Bootstrap for Coefficients
Bootstrap
B
Bias
Std. Error
Sig. (2tailed)
44.809
-.022
2.993
1 Sibling Aggression
.278
-.001
Parenting Style
-1.137
(Constant)
Model
BCa 95% Confidence Interval
Lower
Upper
.001
38.584
50.515
.030
.001
.219
.337
.002
.141
.001
-1.416
-.844
61.898
.008
1.973
.001
58.366
65.620
Sibling Aggression
-.165
-.001
.024
.001
-.213
-.122
Parenting Style
-3.792
.004
.125
.001
-4.044
-3.531
Computer Games
2.173
-.002
.073
.001
2.041
2.309
(Constant)
64.246
.036
1.118
.001
62.063
66.639
-.298
.000
.015
.001
-.328
-.269
-4.346
.000
.083
.001
-4.515
-4.175
Computer Games
2.489
-.001
.051
.001
2.392
2.585
E-Numbers
.151
.000
.006
.001
.138
.162
(Constant)
64.029
.353
2.965
.001
58.652
71.907
Sibling Aggression
-.299
-3.506E-005
.015
.001
-.331
-.268
Parenting Style
-4.347
.001
.086
.001
-4.521
-4.169
Computer Games
2.488
.001
.053
.001
2.384
2.595
E-Numbers
.151
-1.668E-006
.006
.001
.138
.162
Television
.011
-.016
.146
.933
-.287
.228
(Constant)
2
a
Sibling Aggression
3 Parenting Style
4
a. Unless otherwise noted, bootstrap results are based on 1000 bootstrap samples
Casewise Diagnostics
a
Case Number
Std. Residual
Aggression
Predicted Value
Residual
13
-2.848
55.00
62.8040
-7.80401
14
-2.008
30.00
35.5032
-5.50324
20
2.586
55.00
47.9134
7.08661
71
-2.202
26.00
32.0345
-6.03450
79
2.038
68.00
62.4151
5.58493
128
-2.708
49.00
56.4198
-7.41976
131
2.272
39.00
32.7742
6.22575
171
-2.104
21.00
26.7668
-5.76676
205
2.327
42.00
35.6236
6.37638
208
-2.244
30.00
36.1485
-6.14850
222
-2.328
43.00
49.3796
-6.37960
249
2.075
44.00
38.3148
5.68517
279
2.295
32.00
25.7114
6.28862
298
2.148
46.00
40.1137
5.88631
a. Dependent Variable: Aggression
13
/Turn over
C8552 Discovering Statistics
14
/Turn over
C8552 Discovering Statistics
15
/Turn over
C8552 Discovering Statistics
FORMULAE
𝑑̂ =
𝑋̅𝐸𝑥𝑝𝑒𝑟𝑖𝑚𝑒𝑛𝑡𝑎𝑙 − 𝑋̅𝐶𝑜𝑛𝑡𝑟𝑜𝑙
𝑠𝐶𝑜𝑛𝑡𝑟𝑜𝑙
𝑋̅ −𝑋̅
𝑑̂ = 1 2
𝑠𝑝
(𝑁1 −1)𝑠12 +(𝑁2 −1)𝑠22
𝑁1 +𝑁2 −2
SS
df
MS =
F=
𝑠𝑝 = √
MSM
MSR
𝑅2 =
SSM
SST
𝑟=√
𝑡2
𝑡 2 +df
𝑟=√
𝐹(1,𝑥)
𝐹(1,𝑥)+dfR
dfT = N  1
dfM = k  1
dfR = dfT−dfM
𝑧=
𝑋 − 𝑋̅
𝑠
END OF PAPER
16