Session One

Statistical Tools for Research ---SPSS (1)
Topic:
Introductory Quantitative Data Collection and Analysis
Date:
March 24 & 25, 2003
Time:
6:00-8:30pm
Venue:
B0416 & B0412
Facilitators:
Dr. Zhang Wei-yuan & Ms. Elaine Kwok
(CRIDAL, OUHK)
The workshop is intended for beginners and its emphasis is on data analysis using the
computer software Statistical Package for Social Sciences (SPSS) for Windows. It
will cover simple statistical concepts and techniques applicable to exploratory and
confirmatory data analysis.
Recommended reading:
Norusis, M. J (2000) SPSS10.0: Guide to Data Analysis, New Jersey: Prentice
Hall.
Ferguson, G. A & Takane, Y (1989) Statistical Analysis in Psychology and
Education, 6th ed., New York: McGraw-Hill Publishing Company.
1
Literature Review
Conducting a Computer Search
1. English literature searching (journal articles)
Related literature for most educational research topics could be found in the database
provider of OUHK library----Electronic resources

Database provider
 EBSCO: EBSCO (host)
 (1) Academic Search Elite OR (2) ERIC
---keywords (for example: learning style; online learning)
---get results
---abstract and/or full text
---print

Gale: Info Trac Web
 Expanded Academic ASAP
---keywords (for example: learning style; online learning)
---get results
---full text
---print
2. Chinese literature searching
The Chinese Educational Resources Information Centre (Chinese ERIC), CUHK
Website: <http://www.fed.cuhk.edu.hk/ceric/>
The popular journals in Open and distance Education in China:
i. 中國遠程教育 (Distance Education in China)
ii.
<http://www.chinadisedu.com>
開 放 教 育 研 究 (Open Education Research) published by Shanghai TV
iii.
University <www.shtvu.edu.cn/kfyj>
現代遠距離教育 (Modern Distance Education) published by Heilongjiang
iv.
RTV University
遠程教育雜誌 (Journal of Distance Education) published by Zhejiang RTV
v.
University <www.distance-edu.net>
隔空教育論壇年刊 (Distance Education Forum) published by National Open
University in Taiwan
< http://www.nou.edu.tw/%7Eresearch/fram_01/data_tatol_01.htm>
2
SPSS (Statistical Package for the Social Sciences) for Windows
There are two most commonly used statistical packages as follows:
 Statistical Package for the Social Sciences (SPSS)
 Statistical Analysis System (SAS)
Lesson 1: Data Editor
1.
Starting SPSS for Window:

Start – Program - SPSS for Windows - SPSS 10.1 for Windows – Type in data
OR open an existing data source - Get Untitled SPSS Data Editor Window OR Get
SPSS Data Editor Window
2.



Data Editor
Cases are represented in rows;
Variables are presented in columns;
The interaction of the row and the column is called a cell.
Defining a variable
Click “Variable view” at the bottom left of the Data Editor

“Name”: e.g. Gender (no more than 8 letters/ numbers & the first character
must be a letter)




“Type”: numeric (number) or string (characters to no more than 250)
“Width”: set 8
“Decimal”: e.g. 0, 2.
“Label”: identify a particular question (title of output table). e.g. Gender of
respondents.

“Values”:
(1.)
Value: e.g. 1
(2.)
Value label: e.g. male
(3.)
(4.)
Note:




Click ‘Add’
Click ‘OK’
8, 88, 888: not applicable (depend on the given number of values)
9, 99, 999: No answer (depend on the given number of values)
“Missing”: discrete missing values: e.g. 8, 9
“Columns”: set 8
“Align”: left, center, right
“Measure”:
3
Nominal: This gives categorization without order:
e.g. : gender (1=male; 2=female);
Age range (1=17 or below; 2=18-29; 3=30-39; 4=40-49; 5=50and above)
Nationality (1=Chinese; 2=British; 3=American; 4=others)
Ordinal: 5-point Likert scales
e.g.: very good – good – no opinion – poor - very poor;
Very satisfactory–satisfactory–undecided–unsatisfactory–very unsatisfactory;
Most important – important- neutral – not important – least important;
Strongly agree – agree – undecided – disagree – strongly disagree;
Highly favorable– favorable–no opinion–unfavorable– highly unfavorable;
Highly appropriate– appropriate–neutral–inappropriate–highly inappropriate;
Very supportive–supportive–neutral–unsupportive–very unsupportive;
Definitely yes–probably yes–uncertain–probably no– definitely no.
Ordinal: 3 or 4 point scales
e.g.: agree-undecided-disagree
Very satisfied- moderate satisfied-a little satisfied –very dissatisfied.
Scale: The scale contains a true zero point that indicates a total absence of
whatever is being measured. E.g. height; working hours per week.
Entering Data—Click to “Data View” at the bottom left of the Data Editor

Move to next cell in row, hit “Down” arrow key OR “Enter”;

Move to next cell in column, hit “Left” arrow key OR “Tab”.
Inserting variables or cases

Highlight a particular variable column where you want a new
variable to be inserted, go to the tool bar, select “Data” - “Insert
Variable” .

Highlight a particular row where you want a new row (case) to be
inserted, go to tool bar, select “Data” – “Insert Row”.
Delete variables or cases

Highlight the variable, go to tool bar, select “Edit” – “Cut”.

Highlight the row (case), go to tool bar, select “Edit” – “Cut”.
Go to case

Go to tool bar, select “Data” - “ Go to case”.
4
Select cases

Go to tool bar, select “Data” – “select cases”
1. all cases
2. If condition is satisfied – if – select variable – e.g.
gender=1, continue – ok
3. Based on time or case range – range: observation: first
case to last case, e.g.: 26 to 79
Sample of questionnaire for Exercise 1:
Gender:
Male □
Female □
Age Group:
17-19 □
20-24 □
41-50 □
50+ □
25-30 □
31-40 □
Your height:________ m
Number of online courses you are currently teaching: _____
Job Title:
Professor □ Associate Professor
□
Assistant Professor □
Lecturer/Tutor □ Teaching Assistant □ Research Associate □
Research Assistant □
Dean □
Director □
President □
Others (please specify): ___________________
How often do you use the learning center on your teaching?
Always□
Often□ Sometimes□
Rarely□
Never□
Exercise 1
1.
Define variables and enter data as following table:
Variables name
Description
Gender
Gender of respondents
1=Male; 2=Female; 9= no answer
Age
Age range of respondent
1=17-19; 2=20=24; 3=25-30; 4=31-40; 5=41-50; 6=50+; 9= no
5
answer
Height
Height of respondent, e.g. 1.69; 1.78
9=no answer
Course
Number of courses with partial online WBI.
88=not applicable; 99= no answer
Jobtit
Job title of respondent
1=professor; 2=associate professor; 3 =assistant professor;
4 =lecturer/tutor; 5=teaching assistant; 6=research associate;
7=research assistant; 8=dean; 9=director; 10=president; 88=
applicable; 99= no answer
Venue
The educator use of learning center
5=always; 4=often; 3=sometimes; 2=rarely; 1=never;
8=not applicable; 9= no answer
Code
Gender
Height
Age
Course
Jobtit
Venue
1
1
1.69
2
1
5
4
2
2
1.57
2
5
3
2
3
1
1.80
4
2
8
5
4
2
1.50
3
0
6
4
5
2
1.65
5
1
9
0
6
1
1.80
9
0
5
3
7
9
1.75
1
0
5
1
8
2
1.55
2
3
4
5
9
1
1.90
3
4
6
0
10
2
1.77
5
1
1
9
2.
Change the name of the variable from Jobtit to job
3.
Change gender in case 7 from 9 to 1
4.
Insert the case below as case 3
1
5.
2.0
4
2
2
5
Insert a variable “program” below between “Age” and “course”
Program
B343C
Managing in Organziations
6
B343C
Marketing Research
Company Law
B370
Electronic Financial Services
Language and Literacy in Social
Context
ES850C
Introduction to the Internet
Emerging Technologies
6.
Select cases
(1) Gender =1
(2) Range: cases 3 to 8
7
Lesson 2: To run Frequencies
2.1
Some basic concepts of statistics
Median: The point in a distribution below which 50 percent of the scores lie. If the
number is even, use the midpoint between the two middle scores as the median.
Examples: (i) 3,5,8,10 &11. The median is 8
(ii) 4,5,7,9,11,& 12. The median is (7+9)/2=8
Mode: The point or score of the greatest frequency in a distribution.
Example: 2,3,4,4,4,4,5,7,7 & 9. The mode is 4
Mean: The sum of the scores in a distribution divided by the number of scores in the
distribution.
Example: 16,10,5,6,8,15,20,14,16&10. Add the scores and the total is 120. There are
10 scores; so divide 120 by 10.
(16+10+5+6+8+15+20+14+16+10)/10=12 The mean is 12
Standard Deviation (Sd): The most frequently used measure of variability is the
standard deviation. It is “standard” in the sense that it looks at the average variability
of all the scores around mean. The larger of standard deviation, the more variability
from the central point in the distribution.
Example: Language aptitude scores of four classes
Class
Mean
Sd
1
2
55.6
56.4
23.4
4.8
3
4
39.1
38.1
5.3
18.7
8
2.2 To run Frequencies
(1) Open a data file
Untitled SPSS Data Editor
>click Open-Floppy A, Select GSS.sav file (Data from General Social Survey)
>Go to Analysis - Descriptive Statistics – Frequencies - click the variable(s) into the
variable(s) box
>Select “sex”; “satjob”-click OK
>Select “marital”; “macolleg”, “pacolleg”-click OK
>Select “classicl”; “opera” -click OK
(2) Charts:
>Go to Graphs-Bar-define-click “sex” into the “Category axis”
>Go to Graphs-Pie-define-click “degree” into the “Category axis”
>Go to Graphs-Histogram-move “age” into the “variable”
2.3 To run Descriptives (Mean; Sd; etc)
>Go to Analysis-Descriptive Statistics-Descriptive-click the variable(s) into the
variable(s) box
>Select “age”; “edu” & “degree”
>Click Options: Mean; Sd; Minimum; Maximum.
>Select “satjob”; “impjob” & “income91”
>Click Options: Mean; Sd; Minimum; Maximum.
Exercise 2:

Repeat 2.2 and 2.3 with other variables
9
Lesson 3: To run Crosstabs
Chi-square test:
Help us to answer such question as:
Are the two variables independent? OR
Is there any relationship between the two variables?

For example, you would like to know whether or not there is a significant
difference between male and female students in their preference of learning
methods? You can’t let a student to choose two preferences of learning
methods.
 Most of the expected counts must be greater than 5, and none less than 1.
e.g.
Male
Female
Method 1
>5
>5
Method 2
>5
>5
Method 3
>5
>5
Method 4
>5
>5
For example: A study was conducted to test a possible relationship between first
language background and desire for a student-centered classroom in an adult
ESL class.
For
Against
Undecided

Chinese
Spanish
French
11
45
16
30
12
8
25
7
10
Pearson Chi-square-Crosstabs
 Go to analysis-descriptives statistics-crosstabs
 Select one or more control variables,
e.g. Row: “Sex”; Column: “Degree”
 Click statistics- chi-square – continue
 Click cells (for percentages) – Click: row, column, and total continue
 OK
10
For example: >Go to “GSS.file”
“In spite of what some people say, the lot of the average man is getting
worse, not better.” Let’s consider whether education is related to likelihood
of agreeing with this statement.
No college degree
College degree
Agree
Disagree


Select “anomia5” as Row (1=agree 2=disagree)
Select “degree2” as Column (1=degree 0=no degree)
 If P-value < 0.01, there is highly significant correlation (99%
confidences);
 If P-value < 0.05 but > 0.01, there is significant correlation (95%
confidences);
 If P-value > 0.05, there is no significant correlation.
 Either more than 20% cells have expected count less than 5, or the
minimum expected count is less than 1, Chi-square test couldn’t be used.
Exercise 3:
1. “anomia 5 with sex”;
2. “ sex and life”;
3. “degree2 with satjob”
11
Lesson 4: To Run Paired-Samples T-Test
For the paired T test, the means of two variables are compared.

Comparing the means of two variables for a single group (before and after).
The study design for this test involves measuring each subject twice: Before
and After some kind of treatment or intervention.
e.g.: In a study on high blood pressure, all patients are measured at the
beginning of the study, given treatment, and measured again. Thus, all patients
have two measures, often called before and after measures.

Comparing the means of two variables of match pairs.
Example 1: experimental group and control group.
Experimental group: new teaching method.
Control group: traditional teaching method.
Test.
Example 2: fathers’ education and mothers’ education
Example:
> File – open – data - “endorph” (Beta endorphin levels before and after a half
marathon run for 11 men.
> click Analyze - Compare means - Paired-Samples T-Test
> Select before and after variables
> Options - Confidence Interval. Specify a value (such as 90/95) in this box.
> OK
Exercise 4:
1. Open file “Gss.sav”
Now consider differences between the parents’ years of education (variables “paeduc”
and “maedu”). Is there a statistically significant average difference between fathers’
and mothers’ years of education?
2. Open file “Country.sav”
Is there a statistically significant average difference of the average life expectancy
between males and females (variables “lifeexpm” and “lifeexpf”)?
3.
Repeat Lesson 4 using other variables.
12
Lesson 5: To Run One-Way ANOVA (Analysis of Variance)

Comparing more than two population means.
Example: if we are studying four methods for teaching English, you want to compare
average test scores for all four groups.

Dependent variable and independent variable
 Dependent variable: A variable being affected or assumed to be affected by
the independent variable.
 Independent variable: A variable that affects ( or is assumed to affect) the
dependent variable under study and is included in the research design so
that its effect can be determined.
Example 1: The effect of four teaching methods on reading scores on students.
 Dependent variable: reading scores
 Independent variable: teaching methods
Example 2: People’s average number of working hours are affected by their
educational levels
 Dependent variable: the average number of hours worked in a week
 Independent variable: educational levels (less than high school; high
school; junior college; bachelor; and graduate).
To obtain a one-way analysis of variance (ANOVA):
 You must indicate the variable whose mean you wan to compare, and move it into
“Dependent List”
 Select the variable whose values define the groups and move it into “Factor” box
 Click OK
>Open file “gssft”
>Click Analyse-Compare Means-One-way ANOVA
>Select the variable “hrs1” and move it into “Dependent List”
>Select the variable “degree” and move it into “Factor box”.
13
Bonferroni multiple comparison test
Many multiple comparison procedures are available. One of the simplest is the
Bonferroni procedure.
<click Analysis-Compare Means-One-way ANOVA
>Select the variable “hrs1” and move it into “Dependent List”
>Select the variable “degree” and move it into “Factor box”
>click “Post Hoc” and tick Bonferroni
>Set significance level at 0.05 or 0.01
The difference in hours worked between the two groups is shown in the column
labeled Mean Difference. Pairs of means that are significantly different form each
other marked with an asterisk.
Results:
People with graduate degree work significantly longer than people with less than a
high school education;
People with graduate degree work significantly longer than people with just a high
school education;
No two other groups are significantly different form one another.
Exercise:
1. Repeating the sample above.
2. Use the “gss” data file:
 Is there a relationship between highest degree earned and number of hours of
television viewed a day (variable “degree” & “tvhours”)?
 Dependent variable: the average number of hours of TV viewed a day
 Independent variable: educational levels (less than high school; high school;
junior college; bachelor; & graduate).
14