CRJ Doctoral Comprehensive Exam Statistics Friday August 23, 2013 2:00pm – 5:30pm Instructions: (Answer all questions below) Question I: Data Collection and Bivariate Hypothesis Testing 1. Answer the following questions as they pertain to bivariate statistical approaches to testing for group differences and variable association. a) The T-test, ANOVA, and Chi-Square test are all ways of detecting variable associates via examinations of groups differences and associations. In what instance would you expect each of the three tests to be used? b) Pertaining to the first two tests listed above, how are the formal null hypotheses stated? What are the meanings of these formal statements? c) What is sampling theory? How is sampling theory linked to probability? … and how does this underlie our ability to produce reliable and statistics within reasonable levels of confidence? d) Suppose you must choose the one- or two-tailed version pertain to certain tests mentioned above. In what cases would a one-tail test appropriate? In what case would a two-tail test be appropriate? Why? Question II: Multivariate Regression Analysis OLS (see attached output) Familial disruption has been linked to higher levels of social disorganization and crime rates in research in the area of ecological criminology. However, levels of familial disruption have also been shown to be significantly related to regional differences in crime rates. Using county level data, the attached output has been compiled to test for the potential effects of being a Southern County (“south”) and the county level percent divorced (“pctdiv”) on the index crime rate of the county (“indexrt”). Interpret the output by detailing the results of the analysis and referring to the appropriate tables in your attempt to answer this question. Be sure to properly, and formally, interpret all appropriate statistics from the output. In doing so, focus on three basic research questions: 1) What are the basic assumptions of the OLS regression approach? How are each tested in this case? … does this data violate any of these assumptions? 2) Interpret all useful statistical output? 3) If we wanted to test that the relationship between familial disruption and crime rates at the county level were related to the region of the country in which the county was geographically, how would we do that in both mediation and moderation form? Logistic (see attached output) Using survey data associated with neighborhood conditions, fear, and demographics suppose an analysis of one’s fear of their neighborhood was conducted. In the dataset there are a series of variables, including a binary indicator of neighborhood fear (1 = “ever” feeling unsafe in one’s neighborhood in reference to 0 = never feeling unsafe). For this question then, we are predicting ever feeling unsafe in one’s neighborhood by race (being white), gender (being male) and by age. Interpret the output by detailing the results of the analysis and referring to the appropriate tables in your attempt to answer this question. Be sure to properly, and formally, interpret all appropriate statistics from the output. In doing so, focus on three basic research questions/directives: 1) What are the basic assumptions of the Logistic regression approach? How does this differ from the OLS approach?... and what inherent violations of the OLS approach make using the Logistic Regression approach necessary (hint: refer to violations of OLS assumptions)? 2) What is the nature of the Block 0 and Block 1 portions of the output? What does each section represent? 3) Interpret all useful statistical output. Question 2 Part 1 Page 1 of 5 Regression b Variables Entered/Removed Model 1 Variables Entered % of the population divorced, Southern County Indicator Variables Removed . Method Enter b. Dependent Variable: County Crime Rate per 100,000 b Model Summary Model R R Square a 1 .306 Adjusted R Square .093 Std. Error of the Estimate .092 DurbinWatson 29.59338 1.479 a. Predictors: (Constant), % of the population divorced, Southern County Indicator b. Dependent Variable: County Crime Rate per 100,000 b ANOVA Sum of Squares Model 1 Regression df Mean Square 122035.452 2 61017.726 Residual 1184914.246 1353 875.768 Total 1306949.698 1355 F 69.673 Sig. a .000 a. Predictors: (Constant), % of the population divorced, Southern County Indicator b. Dependent Variable: County Crime Rate per 100,000 a Coefficients Unstandardized Coefficients Model B 1 Std. Error (Constant) 32.689 1.221 Southern County Indicator 21.101 1.953 .139 .168 % of the population divorced Page 1 Question 2 Part 1 Page 2 of 5 a Coefficients Standardized Coefficients Model Collinearity Statistics Beta 1 t (Constant) Sig. 26.769 .000 Tolerance VIF Southern County Indicator .297 10.805 .000 .886 1.129 % of the population divorced .023 .826 .409 .886 1.129 a. Dependent Variable: County Crime Rate per 100,000 a Collinearity Diagnostics Model Variance Proportions Dimension Eigenvalue 1 Condition Index (Constant) Southern County Indicator % of the population divorced 1 2.223 1.000 .07 .08 .07 2 .529 2.050 .18 .88 .06 3 .248 2.997 .75 .04 .88 a. Dependent Variable: County Crime Rate per 100,000 Casewise Diagnostics a Std. Residual County Crime Rate per 100,000 163 3.005 123.89 34.9662 88.92382 238 3.645 142.55 34.6885 107.86155 292 3.131 148.13 55.4705 92.65954 359 5.756 227.27 56.9285 170.34148 563 10.121 332.27 32.7583 299.51174 620 6.420 247.12 57.1229 189.99707 1094 3.624 140.15 32.9110 107.23899 1101 3.027 122.96 33.3693 89.59074 1103 3.396 154.88 54.3734 100.50655 1104 3.952 152.34 35.3967 116.94335 1112 3.832 146.33 32.9388 113.39121 1115 3.344 133.20 34.2441 98.95591 1208 3.499 159.66 56.1231 103.53688 Case Number Predicted Value Residual a. Dependent Variable: County Crime Rate per 100,000 Page 2 Question 2 Part 1 Page 3 of 5 a Residuals Statistics Predicted Value Residual Minimum Maximum 32.6888 58.1505 38.9111 9.49016 1356 -57.31899 299.51175 .00000 29.57153 1356 -.656 2.027 .000 1.000 1356 -1.937 10.121 .000 .999 1356 Std. Predicted Value Std. Residual Mean Std. Deviation N a. Dependent Variable: County Crime Rate per 100,000 Charts Histogram Dependent Variable: County Crime Rate per 100,000 Mean = 2.79E-15 Std. Dev. = 0.999 N = 1,356 Frequency 300 200 100 0 -2 0 2 4 6 8 10 12 Regression Standardized Residual Page 3 Question 2 Part 1 Page 4 of 5 Normal P-P Plot of Regression Standardized Residual Dependent Variable: County Crime Rate per 100,000 1.0 Expected Cum Prob 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Observed Cum Prob Page 4 Question 2 Part 1 Page 5 of 5 Scatterplot Dependent Variable: County Crime Rate per 100,000 Regression Standardized Residual 10 8 6 4 2 0 -2 -4 -2 0 2 4 6 Regression Standardized Predicted Value Page 5 Question 2 Part 2 Page 1 of 3 Logistic Regression Case Processing Summary a N Unweighted Cases Selected Cases Included in Analysis Missing Cases Total Unselected Cases Total Percent 1542 100.0 0 .0 1542 100.0 0 .0 1542 100.0 a. If weight is in effect, see classification table for the total number of cases. Dependent Variable Encoding Original Value Internal Value never unsafe 0 have felt unsafe 1 Block 0: Beginning Block Classification Table a ,b Predicted Observed Ever felt unsafe in your neighborhood never unsafe Step 0 Ever felt unsafe in your neighborhood have felt unsafe never unsafe 939 0 have felt unsafe 603 0 Overall Percentage Classification Table a ,b Predicted Observed Percentage Correct Step 0 Ever felt unsafe in your neighborhood Overall Percentage never unsafe have felt unsafe 100.0 .0 60.9 a. Constant is included in the model. b. The cut value is .500 Page 1 Question 2 Part 2 Page 2 of 3 Variables in the Equation B Step 0 Constant S.E. -.443 Wald .052 df Sig. 72.029 1 Exp(B) .000 .642 Variables not in the Equation Score Step 0 Variables black df Sig. 4.249 1 .039 gender 26.510 1 .000 age 17.907 1 .000 1.251 1 .263 50.865 4 .000 emp_ft Overall Statistics Block 1: Method = Enter Omnibus Tests of Model Coefficients Chi-square Step 1 df Sig. Step 51.583 4 .000 Block 51.583 4 .000 Model 51.583 4 .000 Model Summary Step -2 Log likelihood Cox & Snell R Square Nagelkerke R Square .033 .045 a 1 2012.278 a. Estimation terminated at iteration number 3 because parameter estimates changed by less than . 001. Classification Table a Predicted Observed Ever felt unsafe in your neighborhood never unsafe Step 1 Ever felt unsafe in your neighborhood have felt unsafe never unsafe 854 85 have felt unsafe 496 107 Overall Percentage Page 2 Question 2 Part 2 Classification Table Page 3 of 3 a Predicted Observed Percentage Correct Step 1 Ever felt unsafe in your neighborhood never unsafe 90.9 have felt unsafe 17.7 Overall Percentage 62.3 a. The cut value is .500 Variables in the Equation B Step 1 a black S.E. Wald df Sig. Exp(B) .142 .114 1.533 1 .216 1.152 gender -.558 .111 25.453 1 .000 .573 age -.015 .003 20.666 1 .000 .985 emp_ft -.180 .111 2.621 1 .105 .835 .520 .196 7.039 1 .008 1.682 Constant a. Variable(s) entered on step 1: black, gender, age, emp_ft. Page 3
© Copyright 2026 Paperzz