Statistical Tools for Research ---SPSS (1) Topic: Introductory Quantitative Data Collection and Analysis Date: March 24 & 25, 2003 Time: 6:00-8:30pm Venue: B0416 & B0412 Facilitators: Dr. Zhang Wei-yuan & Ms. Elaine Kwok (CRIDAL, OUHK) The workshop is intended for beginners and its emphasis is on data analysis using the computer software Statistical Package for Social Sciences (SPSS) for Windows. It will cover simple statistical concepts and techniques applicable to exploratory and confirmatory data analysis. Recommended reading: Norusis, M. J (2000) SPSS10.0: Guide to Data Analysis, New Jersey: Prentice Hall. Ferguson, G. A & Takane, Y (1989) Statistical Analysis in Psychology and Education, 6th ed., New York: McGraw-Hill Publishing Company. 1 Literature Review Conducting a Computer Search 1. English literature searching (journal articles) Related literature for most educational research topics could be found in the database provider of OUHK library----Electronic resources Database provider EBSCO: EBSCO (host) (1) Academic Search Elite OR (2) ERIC ---keywords (for example: learning style; online learning) ---get results ---abstract and/or full text ---print Gale: Info Trac Web Expanded Academic ASAP ---keywords (for example: learning style; online learning) ---get results ---full text ---print 2. Chinese literature searching The Chinese Educational Resources Information Centre (Chinese ERIC), CUHK Website: <http://www.fed.cuhk.edu.hk/ceric/> The popular journals in Open and distance Education in China: i. 中國遠程教育 (Distance Education in China) ii. <http://www.chinadisedu.com> 開 放 教 育 研 究 (Open Education Research) published by Shanghai TV iii. University <www.shtvu.edu.cn/kfyj> 現代遠距離教育 (Modern Distance Education) published by Heilongjiang iv. RTV University 遠程教育雜誌 (Journal of Distance Education) published by Zhejiang RTV v. University <www.distance-edu.net> 隔空教育論壇年刊 (Distance Education Forum) published by National Open University in Taiwan < http://www.nou.edu.tw/%7Eresearch/fram_01/data_tatol_01.htm> 2 SPSS (Statistical Package for the Social Sciences) for Windows There are two most commonly used statistical packages as follows: Statistical Package for the Social Sciences (SPSS) Statistical Analysis System (SAS) Lesson 1: Data Editor 1. Starting SPSS for Window: Start – Program - SPSS for Windows - SPSS 10.1 for Windows – Type in data OR open an existing data source - Get Untitled SPSS Data Editor Window OR Get SPSS Data Editor Window 2. Data Editor Cases are represented in rows; Variables are presented in columns; The interaction of the row and the column is called a cell. Defining a variable Click “Variable view” at the bottom left of the Data Editor “Name”: e.g. Gender (no more than 8 letters/ numbers & the first character must be a letter) “Type”: numeric (number) or string (characters to no more than 250) “Width”: set 8 “Decimal”: e.g. 0, 2. “Label”: identify a particular question (title of output table). e.g. Gender of respondents. “Values”: (1.) Value: e.g. 1 (2.) Value label: e.g. male (3.) (4.) Note: Click ‘Add’ Click ‘OK’ 8, 88, 888: not applicable (depend on the given number of values) 9, 99, 999: No answer (depend on the given number of values) “Missing”: discrete missing values: e.g. 8, 9 “Columns”: set 8 “Align”: left, center, right “Measure”: 3 Nominal: This gives categorization without order: e.g. : gender (1=male; 2=female); Age range (1=17 or below; 2=18-29; 3=30-39; 4=40-49; 5=50and above) Nationality (1=Chinese; 2=British; 3=American; 4=others) Ordinal: 5-point Likert scales e.g.: very good – good – no opinion – poor - very poor; Very satisfactory–satisfactory–undecided–unsatisfactory–very unsatisfactory; Most important – important- neutral – not important – least important; Strongly agree – agree – undecided – disagree – strongly disagree; Highly favorable– favorable–no opinion–unfavorable– highly unfavorable; Highly appropriate– appropriate–neutral–inappropriate–highly inappropriate; Very supportive–supportive–neutral–unsupportive–very unsupportive; Definitely yes–probably yes–uncertain–probably no– definitely no. Ordinal: 3 or 4 point scales e.g.: agree-undecided-disagree Very satisfied- moderate satisfied-a little satisfied –very dissatisfied. Scale: The scale contains a true zero point that indicates a total absence of whatever is being measured. E.g. height; working hours per week. Entering Data—Click to “Data View” at the bottom left of the Data Editor Move to next cell in row, hit “Down” arrow key OR “Enter”; Move to next cell in column, hit “Left” arrow key OR “Tab”. Inserting variables or cases Highlight a particular variable column where you want a new variable to be inserted, go to the tool bar, select “Data” - “Insert Variable” . Highlight a particular row where you want a new row (case) to be inserted, go to tool bar, select “Data” – “Insert Row”. Delete variables or cases Highlight the variable, go to tool bar, select “Edit” – “Cut”. Highlight the row (case), go to tool bar, select “Edit” – “Cut”. Go to case Go to tool bar, select “Data” - “ Go to case”. 4 Select cases Go to tool bar, select “Data” – “select cases” 1. all cases 2. If condition is satisfied – if – select variable – e.g. gender=1, continue – ok 3. Based on time or case range – range: observation: first case to last case, e.g.: 26 to 79 Sample of questionnaire for Exercise 1: Gender: Male □ Female □ Age Group: 17-19 □ 20-24 □ 41-50 □ 50+ □ 25-30 □ 31-40 □ Your height:________ m Number of online courses you are currently teaching: _____ Job Title: Professor □ Associate Professor □ Assistant Professor □ Lecturer/Tutor □ Teaching Assistant □ Research Associate □ Research Assistant □ Dean □ Director □ President □ Others (please specify): ___________________ How often do you use the learning center on your teaching? Always□ Often□ Sometimes□ Rarely□ Never□ Exercise 1 1. Define variables and enter data as following table: Variables name Description Gender Gender of respondents 1=Male; 2=Female; 9= no answer Age Age range of respondent 1=17-19; 2=20=24; 3=25-30; 4=31-40; 5=41-50; 6=50+; 9= no 5 answer Height Height of respondent, e.g. 1.69; 1.78 9=no answer Course Number of courses with partial online WBI. 88=not applicable; 99= no answer Jobtit Job title of respondent 1=professor; 2=associate professor; 3 =assistant professor; 4 =lecturer/tutor; 5=teaching assistant; 6=research associate; 7=research assistant; 8=dean; 9=director; 10=president; 88= applicable; 99= no answer Venue The educator use of learning center 5=always; 4=often; 3=sometimes; 2=rarely; 1=never; 8=not applicable; 9= no answer Code Gender Height Age Course Jobtit Venue 1 1 1.69 2 1 5 4 2 2 1.57 2 5 3 2 3 1 1.80 4 2 8 5 4 2 1.50 3 0 6 4 5 2 1.65 5 1 9 0 6 1 1.80 9 0 5 3 7 9 1.75 1 0 5 1 8 2 1.55 2 3 4 5 9 1 1.90 3 4 6 0 10 2 1.77 5 1 1 9 2. Change the name of the variable from Jobtit to job 3. Change gender in case 7 from 9 to 1 4. Insert the case below as case 3 1 5. 2.0 4 2 2 5 Insert a variable “program” below between “Age” and “course” Program B343C Managing in Organziations 6 B343C Marketing Research Company Law B370 Electronic Financial Services Language and Literacy in Social Context ES850C Introduction to the Internet Emerging Technologies 6. Select cases (1) Gender =1 (2) Range: cases 3 to 8 7 Lesson 2: To run Frequencies 2.1 Some basic concepts of statistics Median: The point in a distribution below which 50 percent of the scores lie. If the number is even, use the midpoint between the two middle scores as the median. Examples: (i) 3,5,8,10 &11. The median is 8 (ii) 4,5,7,9,11,& 12. The median is (7+9)/2=8 Mode: The point or score of the greatest frequency in a distribution. Example: 2,3,4,4,4,4,5,7,7 & 9. The mode is 4 Mean: The sum of the scores in a distribution divided by the number of scores in the distribution. Example: 16,10,5,6,8,15,20,14,16&10. Add the scores and the total is 120. There are 10 scores; so divide 120 by 10. (16+10+5+6+8+15+20+14+16+10)/10=12 The mean is 12 Standard Deviation (Sd): The most frequently used measure of variability is the standard deviation. It is “standard” in the sense that it looks at the average variability of all the scores around mean. The larger of standard deviation, the more variability from the central point in the distribution. Example: Language aptitude scores of four classes Class Mean Sd 1 2 55.6 56.4 23.4 4.8 3 4 39.1 38.1 5.3 18.7 8 2.2 To run Frequencies (1) Open a data file Untitled SPSS Data Editor >click Open-Floppy A, Select GSS.sav file (Data from General Social Survey) >Go to Analysis - Descriptive Statistics – Frequencies - click the variable(s) into the variable(s) box >Select “sex”; “satjob”-click OK >Select “marital”; “macolleg”, “pacolleg”-click OK >Select “classicl”; “opera” -click OK (2) Charts: >Go to Graphs-Bar-define-click “sex” into the “Category axis” >Go to Graphs-Pie-define-click “degree” into the “Category axis” >Go to Graphs-Histogram-move “age” into the “variable” 2.3 To run Descriptives (Mean; Sd; etc) >Go to Analysis-Descriptive Statistics-Descriptive-click the variable(s) into the variable(s) box >Select “age”; “edu” & “degree” >Click Options: Mean; Sd; Minimum; Maximum. >Select “satjob”; “impjob” & “income91” >Click Options: Mean; Sd; Minimum; Maximum. Exercise 2: Repeat 2.2 and 2.3 with other variables 9 Lesson 3: To run Crosstabs Chi-square test: Help us to answer such question as: Are the two variables independent? OR Is there any relationship between the two variables? For example, you would like to know whether or not there is a significant difference between male and female students in their preference of learning methods? You can’t let a student to choose two preferences of learning methods. Most of the expected counts must be greater than 5, and none less than 1. e.g. Male Female Method 1 >5 >5 Method 2 >5 >5 Method 3 >5 >5 Method 4 >5 >5 For example: A study was conducted to test a possible relationship between first language background and desire for a student-centered classroom in an adult ESL class. For Against Undecided Chinese Spanish French 11 45 16 30 12 8 25 7 10 Pearson Chi-square-Crosstabs Go to analysis-descriptives statistics-crosstabs Select one or more control variables, e.g. Row: “Sex”; Column: “Degree” Click statistics- chi-square – continue Click cells (for percentages) – Click: row, column, and total continue OK 10 For example: >Go to “GSS.file” “In spite of what some people say, the lot of the average man is getting worse, not better.” Let’s consider whether education is related to likelihood of agreeing with this statement. No college degree College degree Agree Disagree Select “anomia5” as Row (1=agree 2=disagree) Select “degree2” as Column (1=degree 0=no degree) If P-value < 0.01, there is highly significant correlation (99% confidences); If P-value < 0.05 but > 0.01, there is significant correlation (95% confidences); If P-value > 0.05, there is no significant correlation. Either more than 20% cells have expected count less than 5, or the minimum expected count is less than 1, Chi-square test couldn’t be used. Exercise 3: 1. “anomia 5 with sex”; 2. “ sex and life”; 3. “degree2 with satjob” 11 Lesson 4: To Run Paired-Samples T-Test For the paired T test, the means of two variables are compared. Comparing the means of two variables for a single group (before and after). The study design for this test involves measuring each subject twice: Before and After some kind of treatment or intervention. e.g.: In a study on high blood pressure, all patients are measured at the beginning of the study, given treatment, and measured again. Thus, all patients have two measures, often called before and after measures. Comparing the means of two variables of match pairs. Example 1: experimental group and control group. Experimental group: new teaching method. Control group: traditional teaching method. Test. Example 2: fathers’ education and mothers’ education Example: > File – open – data - “endorph” (Beta endorphin levels before and after a half marathon run for 11 men. > click Analyze - Compare means - Paired-Samples T-Test > Select before and after variables > Options - Confidence Interval. Specify a value (such as 90/95) in this box. > OK Exercise 4: 1. Open file “Gss.sav” Now consider differences between the parents’ years of education (variables “paeduc” and “maedu”). Is there a statistically significant average difference between fathers’ and mothers’ years of education? 2. Open file “Country.sav” Is there a statistically significant average difference of the average life expectancy between males and females (variables “lifeexpm” and “lifeexpf”)? 3. Repeat Lesson 4 using other variables. 12 Lesson 5: To Run One-Way ANOVA (Analysis of Variance) Comparing more than two population means. Example: if we are studying four methods for teaching English, you want to compare average test scores for all four groups. Dependent variable and independent variable Dependent variable: A variable being affected or assumed to be affected by the independent variable. Independent variable: A variable that affects ( or is assumed to affect) the dependent variable under study and is included in the research design so that its effect can be determined. Example 1: The effect of four teaching methods on reading scores on students. Dependent variable: reading scores Independent variable: teaching methods Example 2: People’s average number of working hours are affected by their educational levels Dependent variable: the average number of hours worked in a week Independent variable: educational levels (less than high school; high school; junior college; bachelor; and graduate). To obtain a one-way analysis of variance (ANOVA): You must indicate the variable whose mean you wan to compare, and move it into “Dependent List” Select the variable whose values define the groups and move it into “Factor” box Click OK >Open file “gssft” >Click Analyse-Compare Means-One-way ANOVA >Select the variable “hrs1” and move it into “Dependent List” >Select the variable “degree” and move it into “Factor box”. 13 Bonferroni multiple comparison test Many multiple comparison procedures are available. One of the simplest is the Bonferroni procedure. <click Analysis-Compare Means-One-way ANOVA >Select the variable “hrs1” and move it into “Dependent List” >Select the variable “degree” and move it into “Factor box” >click “Post Hoc” and tick Bonferroni >Set significance level at 0.05 or 0.01 The difference in hours worked between the two groups is shown in the column labeled Mean Difference. Pairs of means that are significantly different form each other marked with an asterisk. Results: People with graduate degree work significantly longer than people with less than a high school education; People with graduate degree work significantly longer than people with just a high school education; No two other groups are significantly different form one another. Exercise: 1. Repeating the sample above. 2. Use the “gss” data file: Is there a relationship between highest degree earned and number of hours of television viewed a day (variable “degree” & “tvhours”)? Dependent variable: the average number of hours of TV viewed a day Independent variable: educational levels (less than high school; high school; junior college; bachelor; & graduate). 14
© Copyright 2026 Paperzz