Statistics Comprehensive Examination Questions

CRJ Doctoral Comprehensive Exam
Statistics
Friday August 23, 2013 2:00pm – 5:30pm
Instructions: (Answer all questions below)
Question I: Data Collection and Bivariate Hypothesis Testing
1. Answer the following questions as they pertain to bivariate statistical approaches to
testing for group differences and variable association.
a) The T-test, ANOVA, and Chi-Square test are all ways of detecting variable associates
via examinations of groups differences and associations. In what instance would you
expect each of the three tests to be used?
b) Pertaining to the first two tests listed above, how are the formal null hypotheses stated?
What are the meanings of these formal statements?
c) What is sampling theory? How is sampling theory linked to probability? … and how
does this underlie our ability to produce reliable and statistics within reasonable levels of
confidence?
d) Suppose you must choose the one- or two-tailed version pertain to certain tests
mentioned above. In what cases would a one-tail test appropriate? In what case would a
two-tail test be appropriate? Why?
Question II: Multivariate Regression Analysis
OLS (see attached output)
Familial disruption has been linked to higher levels of social disorganization and crime rates in
research in the area of ecological criminology. However, levels of familial disruption have also
been shown to be significantly related to regional differences in crime rates. Using county level
data, the attached output has been compiled to test for the potential effects of being a Southern
County (“south”) and the county level percent divorced (“pctdiv”) on the index crime rate of the
county (“indexrt”).
Interpret the output by detailing the results of the analysis and referring to the appropriate tables
in your attempt to answer this question. Be sure to properly, and formally, interpret all
appropriate statistics from the output.
In doing so, focus on three basic research questions:
1) What are the basic assumptions of the OLS regression approach? How are each tested in
this case? … does this data violate any of these assumptions?
2) Interpret all useful statistical output?
3) If we wanted to test that the relationship between familial disruption and crime rates at
the county level were related to the region of the country in which the county was
geographically, how would we do that in both mediation and moderation form?
Logistic (see attached output)
Using survey data associated with neighborhood conditions, fear, and demographics suppose an
analysis of one’s fear of their neighborhood was conducted. In the dataset there are a series of
variables, including a binary indicator of neighborhood fear (1 = “ever” feeling unsafe in one’s
neighborhood in reference to 0 = never feeling unsafe). For this question then, we are predicting
ever feeling unsafe in one’s neighborhood by race (being white), gender (being male) and by age.
Interpret the output by detailing the results of the analysis and referring to the appropriate tables
in your attempt to answer this question. Be sure to properly, and formally, interpret all
appropriate statistics from the output.
In doing so, focus on three basic research questions/directives:
1) What are the basic assumptions of the Logistic regression approach? How does this
differ from the OLS approach?... and what inherent violations of the OLS approach make
using the Logistic Regression approach necessary (hint: refer to violations of OLS
assumptions)?
2) What is the nature of the Block 0 and Block 1 portions of the output? What does each
section represent?
3) Interpret all useful statistical output.
Question 2 Part 1
Page 1 of 5
Regression
b
Variables Entered/Removed
Model
1
Variables
Entered
% of the
population
divorced,
Southern
County
Indicator
Variables
Removed
.
Method
Enter
b. Dependent Variable: County Crime Rate per
100,000
b
Model Summary
Model
R
R Square
a
1
.306
Adjusted R
Square
.093
Std. Error of
the Estimate
.092
DurbinWatson
29.59338
1.479
a. Predictors: (Constant), % of the population divorced, Southern County
Indicator
b. Dependent Variable: County Crime Rate per 100,000
b
ANOVA
Sum of
Squares
Model
1
Regression
df
Mean Square
122035.452
2
61017.726
Residual
1184914.246
1353
875.768
Total
1306949.698
1355
F
69.673
Sig.
a
.000
a. Predictors: (Constant), % of the population divorced, Southern County Indicator
b. Dependent Variable: County Crime Rate per 100,000
a
Coefficients
Unstandardized Coefficients
Model
B
1
Std. Error
(Constant)
32.689
1.221
Southern County Indicator
21.101
1.953
.139
.168
% of the population
divorced
Page 1
Question 2 Part 1
Page 2 of 5
a
Coefficients
Standardized
Coefficients
Model
Collinearity Statistics
Beta
1
t
(Constant)
Sig.
26.769
.000
Tolerance
VIF
Southern County Indicator
.297
10.805
.000
.886
1.129
% of the population
divorced
.023
.826
.409
.886
1.129
a. Dependent Variable: County Crime Rate per 100,000
a
Collinearity Diagnostics
Model
Variance Proportions
Dimension
Eigenvalue
1
Condition
Index
(Constant)
Southern
County
Indicator
% of the
population
divorced
1
2.223
1.000
.07
.08
.07
2
.529
2.050
.18
.88
.06
3
.248
2.997
.75
.04
.88
a. Dependent Variable: County Crime Rate per 100,000
Casewise Diagnostics
a
Std. Residual
County Crime
Rate per
100,000
163
3.005
123.89
34.9662
88.92382
238
3.645
142.55
34.6885
107.86155
292
3.131
148.13
55.4705
92.65954
359
5.756
227.27
56.9285
170.34148
563
10.121
332.27
32.7583
299.51174
620
6.420
247.12
57.1229
189.99707
1094
3.624
140.15
32.9110
107.23899
1101
3.027
122.96
33.3693
89.59074
1103
3.396
154.88
54.3734
100.50655
1104
3.952
152.34
35.3967
116.94335
1112
3.832
146.33
32.9388
113.39121
1115
3.344
133.20
34.2441
98.95591
1208
3.499
159.66
56.1231
103.53688
Case Number
Predicted
Value
Residual
a. Dependent Variable: County Crime Rate per 100,000
Page 2
Question 2 Part 1
Page 3 of 5
a
Residuals Statistics
Predicted Value
Residual
Minimum
Maximum
32.6888
58.1505
38.9111
9.49016
1356
-57.31899
299.51175
.00000
29.57153
1356
-.656
2.027
.000
1.000
1356
-1.937
10.121
.000
.999
1356
Std. Predicted Value
Std. Residual
Mean
Std. Deviation
N
a. Dependent Variable: County Crime Rate per 100,000
Charts
Histogram
Dependent Variable: County Crime Rate per 100,000
Mean = 2.79E-15
Std. Dev. = 0.999
N = 1,356
Frequency
300
200
100
0
-2
0
2
4
6
8
10
12
Regression Standardized Residual
Page 3
Question 2 Part 1
Page 4 of 5
Normal P-P Plot of Regression Standardized Residual
Dependent Variable: County Crime Rate per 100,000
1.0
Expected Cum Prob
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
Observed Cum Prob
Page 4
Question 2 Part 1
Page 5 of 5
Scatterplot
Dependent Variable: County Crime Rate per 100,000
Regression Standardized Residual
10
8
6
4
2
0
-2
-4
-2
0
2
4
6
Regression Standardized Predicted Value
Page 5
Question 2 Part 2
Page 1 of 3
Logistic Regression
Case Processing Summary
a
N
Unweighted Cases
Selected Cases
Included in Analysis
Missing Cases
Total
Unselected Cases
Total
Percent
1542
100.0
0
.0
1542
100.0
0
.0
1542
100.0
a. If weight is in effect, see classification table for the total
number of cases.
Dependent Variable Encoding
Original Value
Internal Value
never unsafe
0
have felt unsafe
1
Block 0: Beginning Block
Classification Table
a ,b
Predicted
Observed
Ever felt unsafe in your
neighborhood
never unsafe
Step 0
Ever felt unsafe in your
neighborhood
have felt
unsafe
never unsafe
939
0
have felt unsafe
603
0
Overall Percentage
Classification Table
a ,b
Predicted
Observed
Percentage
Correct
Step 0
Ever felt unsafe in your
neighborhood
Overall Percentage
never unsafe
have felt unsafe
100.0
.0
60.9
a. Constant is included in the model.
b. The cut value is .500
Page 1
Question 2 Part 2
Page 2 of 3
Variables in the Equation
B
Step 0
Constant
S.E.
-.443
Wald
.052
df
Sig.
72.029
1
Exp(B)
.000
.642
Variables not in the Equation
Score
Step 0
Variables
black
df
Sig.
4.249
1
.039
gender
26.510
1
.000
age
17.907
1
.000
1.251
1
.263
50.865
4
.000
emp_ft
Overall Statistics
Block 1: Method = Enter
Omnibus Tests of Model Coefficients
Chi-square
Step 1
df
Sig.
Step
51.583
4
.000
Block
51.583
4
.000
Model
51.583
4
.000
Model Summary
Step
-2 Log
likelihood
Cox & Snell R
Square
Nagelkerke R
Square
.033
.045
a
1
2012.278
a. Estimation terminated at iteration number 3
because parameter estimates changed by less than .
001.
Classification Table
a
Predicted
Observed
Ever felt unsafe in your
neighborhood
never unsafe
Step 1
Ever felt unsafe in your
neighborhood
have felt
unsafe
never unsafe
854
85
have felt unsafe
496
107
Overall Percentage
Page 2
Question 2 Part 2
Classification Table
Page 3 of 3
a
Predicted
Observed
Percentage
Correct
Step 1
Ever felt unsafe in your
neighborhood
never unsafe
90.9
have felt unsafe
17.7
Overall Percentage
62.3
a. The cut value is .500
Variables in the Equation
B
Step 1
a
black
S.E.
Wald
df
Sig.
Exp(B)
.142
.114
1.533
1
.216
1.152
gender
-.558
.111
25.453
1
.000
.573
age
-.015
.003
20.666
1
.000
.985
emp_ft
-.180
.111
2.621
1
.105
.835
.520
.196
7.039
1
.008
1.682
Constant
a. Variable(s) entered on step 1: black, gender, age, emp_ft.
Page 3