Chapter Five Examining Relationships between Variables Aims of

FDTL DATA Project: Quantitative Research Methods
Chapter Five
Examining Relationships between Variables
Aims of the Chapter
In this chapter you are introduced to new statistical techniques. The chapter aims to:
• explore the nature of relationships between two sets of measures
• demonstrate how the relationship between two sets of measures (e.g. height and
weight) can be represented visually
• show how such relationships can also, in parallel, be measured by a more
mathematical method, the correlation coefficient
• discuss the meaning and interpretation of correlation coefficients, and relate this
measure to the nature of causality
The chapter is a development of the dataset introduced in Chapter Four, and also
presupposes data handling techniques, e.g. selecting cases, introduced in that that
chapter. If you haven’t read Chapter Four, it may be necessary to do so now, before
you work through the materials in this chapter.
Introduction
In most of the previous statistical procedures, we have looked at variables one at a
time. For example, in Chapter Four, we looked at the mean score for complexity on
the decision making task, or the mean score for fluency on the narrative, on so on.
Very often, though, we want to look at the relationship between two sets of scores. In
such cases:
• each person (case) has a score (or measure) on each of the two variables
• we want to know if there is a tendency for scores on one variable to relate to scores
on the other variable, e.g.
• for high scores on one to be associated with high scores on the other
• for low scores to be associated with low scores
• for average scores to be associated with average scores
Consider, in this respect, scores for pausing, for the joint dataset of Studies One and
Two, for both the decision making tasks and the narrative tasks.
We can rephrase the “relationships” questions for this specific case as:
• do frequent pausers on the decision making tasks also tend to be frequent pausers
on the narrative tasks
• do infrequent pausers on the decision making tasks also tend to be infrequent
pausers on the narrative tasks
• do average pausers on the decision making tasks also tend to be average pausers on
the narrative tasks
© Peter Skehan 2003
1
FDTL DATA Project: Quantitative Research Methods
More broadly, we are asking here whether there is a characteristic pausing pattern
which prevails over the two tasks. (Notice that we are ignoring other factors here,
such as planning and post-task conditions. In other words, we are assuming that
consistency in pausing overrides any effect of these other conditions.) If there is a
characteristic pausing pattern, it suggests that it is the individual who determines how
much pausing takes place (for reasons of conversational style, or previous teaching, or
whatever: we don’t know what the actual cause might be), rather than the specific
task leading to characteristic pausing patterns.
We can address this issue in two ways. We can look at a visual representation of the
relationship involved. Alternatively we can look at a mathematical version of the
strength of the relationship between the two sets of scores.
Scattergrams
We will start with a visual representation, based on Display1.sav, shown below as
Figure 5.1.
Figure 5.1
Scattergram of Decision-making and
Narrative Pausing
60
50
Score of about
27 for
decision
making and
about 32 for
narrative
40
30
DPAUSE
20
10
0
-10
0
10
20
30
40
50
60
NPAUSE
The vertical axis shows the scores for pauses on the decision-making tasks, and the
horizontal axis shows the scores for pausing on the narrative tasks. So, as you go up
the vertical axis, we have the possible measures which could occur for the number of
pauses on the decision-making tasks, ranging from zero to a maximum possible value,
given here as 60. Any individual’s decision-making pausing score can be located
somewhere on this scale. Exactly the same should then apply to the horizontal axis,
which shows a range of possible scores from zero to sixty also. Well, actually, it
© Peter Skehan 2003
2
FDTL DATA Project: Quantitative Research Methods
shows -10 to 60. SPSS has adjusted the scale to start at the impossible value of -10
because one person got a score of zero. It then “makes space” to the left of this, and
rather stupidly, pretends that a minus score is legitimate, which is nonsense. No actual
minus scores do occur, so ignore this piece of automatic adjustment, and focus on the
part of the diagram between 0 and 60.
Given these two axes, it is possible to think of the pair of scores an individual might
have, one located for the decision-making pauses on the vertical axis, and one for the
narrative pauses, on the horizontal axis. Indeed, this allows us to imagine drawing
two lines: one horizontal line from the point on the vertical axis corresponding to an
individual’s score for decision-making pauses, and a vertical line, going upwards,
from the point on the horizontal axis corresponding to an individual’s score for
narrative pausing. The two drawn lines would then intersect at a point which would
“capture” where we could locate an individual in the two-dimensional figure.
The arrowed score shows such a particular case - someone who got a pausing score of
about 27 for the decision making task and about 32 for the narrative. In other words,
each person can be “defined” as a point in this two dimensional space, representing
the intersection of the pausing score on the decision making task and the pausing
score on the narrative.
The first thing to do is to decide what pattern, if any, is shown by the scores. My
interpretation of the shape in Figure 5.1, would be of a very badly drawn very rough
kite shape (as shown below). This suggests that low scores for decision making
pausing tend to be associated with low scores for narrative pausing. Then the
relationship weakens a little although there is still a slight tendency for high scores to
be associated with high scores. In other words, knowing that someone got a low
score on the decision-making tasks is a good predictor that they got a low score on the
narrative tasks, but knowing that someone got a high score on the decision-making
tasks doesn’t provide such a clear basis for predicting that someone got a high
pausing score for the narrative tasks. There is a tendency, but it is nothing like to
close. As another aspect of capturing this relationship, it is striking that there are
relatively few scores in the top left or bottom right areas of the diagram which is
shown. In other words, you don’t tend to get people who have a low pausing score
on the decision making tasks who have high pausing scores on the narrative tasks.
Or vice versa - high decision-making pausing scores linked to low narrative pausing
scores.
The conclusion to draw here seems to be that visually, there is a relationship between
the two sets of scores, and that the diagram used, which is called a scattergram, helps
bring this out clearly.
© Peter Skehan 2003
3
FDTL DATA Project: Quantitative Research Methods
Figure 5.2
Scattergram of Decision-making and
Narrative Pausing
60
50
40
30
DPAUSE
20
10
0
-10
0
10
20
30
40
50
60
NPAUSE
Task 5.1
Using the file Display1.sav, select cases from the dataset from Studies One and Two
only, and then create scattergrams which show the relationship between:
• decision making complexity and narrative complexity
• decision making accuracy and narrative accuracy
Part One
To select cases from Studies One and Two only, follow the sequence:
Data
Select Cases
Click the radio button for If condition is satisfied
Click the If button which becomes undimmed
Click on Study in the left-hand box and move it into
the right hand box
Click successively on “<” and “3” on the
calculator, and move them across to
the right hand box
Click Continue
Click OK
Part Two
© Peter Skehan 2003
4
FDTL DATA Project: Quantitative Research Methods
Now that you are only dealing with cases from Studies One and Two, you can
produce the actual scattergrams.
To do this, follow these steps:
• select the Graphs menu
• select Scatter
• from the resulting new screen, choose Simple
• choose a variable to go into the (horizontal ) X axis, e.g. decision making
complexity
• choose a variable to go into the (vertical) Y axis, e.g. narrative complexity
• click o.k.
• after you have generated relevant output for the first scattergram, e.g. complexity,
then repeat this set of procedures to produce the scattergram for accuracy.
As a focussing question, consider the claim that one of the two scattergrams produced
for the exercise shows a relationship between the two variables concerned and the
other does not. Which is which?
© Peter Skehan 2003
5
FDTL DATA Project: Quantitative Research Methods
Feedback on Task 5.1
We will look at the scattergram for accuracy first, and then for complexity.
Scattergram of Decision-making and
Narrative Accuracy
.9
.8
.7
DACCURAC
.6
.5
.4
.3
.4
.5
.6
.7
.8
.9
NACCURAC
The scattergram for accuracy does take a little bit more interpreting. My claim
would be that a relationship can be seen here, in that there is a trend for the
data points to lie in a bottom-left to top-right shape. There are some data
points which are top-left and some which are bottom-right, but there are not
many of these, and they are somewhat distributed in this space, without any
“density”. In contrast, the general drift, so to speak, suggests that there is
greater density of the data points (remembering that each data point
represents a person’s scores on the two measures concerned) in this main
pattern of bottom-left to top-right. This indicates a positive relationship, in that
there is a tendency for someone who is accurate on the decision-making task
to be accurate on the narrative task. There is scope, though, for the
relationship to be much stronger. This would be shown if the distibution of
scores had been “tighter” around an imaginary line running bottom-left to topright, with fewer scores drifting off such a line. As it is, we can guess that the
relationship in question is clearly detectable, but moderate in nature.
© Peter Skehan 2003
6
FDTL DATA Project: Quantitative Research Methods
Scattergram of Decision-making and
Narrative Complexity
2.2
2.0
1.8
1.6
NCOMPLEX
1.4
1.2
1.0
.8
1.0
1.2
1.4
1.6
1.8
2.0
2.2
2.4
2.6
DCOMPLEX
The relationship for the complexity scores is somewhat different. In fact, it is
better here to look at the scores for the two tasks separately before attempting
to relate them. The major contrast, in this respect, is between the wide range
of complexity scores for the decision-making task and relatively narrower
range for the narrative task. The decision-making scores “occupy space” fairly
(but not totally) evenly between just over 1.0 and something like 2.3. In
contrast, there is a band of complexity scores between 1.0 and around 1.45
which accounts for most scores on this task. Then there are some in the
range 1.45 to 2.2, but there are, in truth, not very many of these. In other
words, the variation in complexity scores contrasts the two tasks in itself. The
decision-making task seems to have provoked much more variation than the
narrative task.
If, next, we turn to exploring what the visual pattern says about the
relationship between these two sets of scores, it basically looks as if there is
hardly any. There are a lot of scores in the bottom-left section of the
scattergram, and then some in the top-right. But there are also some scores in
the top-left and some in the bottom-right. So we cannot use what is in the
diagram very effectively to predict someone’s complexity score on one task on
the basis of knowing their score on the other.
To generalise just a little here, it appears that there is some consistency of
accuracy in performance across the two tasks, but there is little consistency in
complexity. Accuracy, in other words, functions in a manner similar to fluency,
while complexity seems to be influenced by other things.
We will return to these relationships when we look at correlation coefficient.
© Peter Skehan 2003
7
FDTL DATA Project: Quantitative Research Methods
So far, you have used the Simple option with Scattergram, and used it as
straightforwardly as it can be used. But there is an interesting option within the
Simple procedure, and there are also other Scattergram types, one of which we will
now explore.
In the previous task, for each use of the Scattergrams procedure, you generated a
graph showing the visual relationship between pairs of measures. You did this twice,
for accuracy and complexity, just as the materials, earlier, had done it once, for
pausing. Scattergrams, (and Correlation Coefficients later in this chapter) always
work in this way, focussing on pairs of measures. But it is possible to display more
than one pairwise relationship on screen at the same time, and there are occasions
when it is useful to do this. The next task requires you to explore this.
Task 5.2
In the previous task, you saw that there is visual evidence of a relationship between
the two fluency measures and the two accuracy measures, taken separately. It might
be interesting to see how all four of these measures relate to one another, as the whole
set of pairs of measures:
• decision making fluency with narrative fluency
• decision making accuracy with narrative accuracy
• decision making fluency with decision making accuracy
• narrative fluency with narrative accuracy
• decision making fluency with narrative accuracy
• decision making accuracy with narrative fluency
To explore this, repeat the above instructions, by choosing Graphs and then Scatter…
But then:
Select Matrix
Click Define
Choose both pausing measures and both accuracy measures, one by
one, moving each in turn from the left-hand box to the righthand box, labelled Matrix Variables
Give the chart an appropriate title
Click OK
Focussing Questions
a) Identify the scattergrams in the output diagram that you have seen before, from
previous work in this chapter.
b) Identify the scattergrams in the output diagram that you haven’t seen before.
c) Identify the scattergrams in the output diagram that are simply mirror images of
themselves!
d) Interpret the scattergrams you haven’t seen before. Try and compare the
relationships they represent with the relationships shown by the scattergrams you
have seen before.
© Peter Skehan 2003
8
FDTL DATA Project: Quantitative Research Methods
Feedback on Task 5.2
The first impression here, obviously, is how much data is involved, and how,
as a result, it can be overwhelming. For this very reason, you won’t use the
Matrix facility very much, but it can be useful occasionally to see general
trends. It is nice to know it is there, in other words.
Accuracy-Fluency Scatterplots
DACCURAC
NACCURAC
DPAUSE
NPAUSE
We will go through the questions one-by-one. But first, it is useful to have a
system of referring to the individual scattergrams within the matrix. We will
simply talk about, e.g. Row1:Col1, in this case referring to the top left cell, i.e.
the text cell which says “DACCURAC”.
a) Scattergrams you have seen before:
These are Row1:Col2 (and Row2:Col1) and then Row3:Col4 (and
Row4:Col3)
b) Scattergrams you haven’t seen before:
All the other actual scattergrams, i.e. the block of four in the top-right
section (and in the bottom left section).
c) Mirror images
The diagonal elements in the matrix of scattergams are the variable
names, and they divide the arrangement of cells into upper right and
lower left triangles. The lower left triangle is simply a reflection of the
upper right triangle, and is, basically, the same information. It can be
ignored. This is a quirk of SPSS output. Actually, though, there is one
difference. If you look closely at a “mirror” pair of scatterplots, you will
see that the one in the lower left is “reflected”, or “reversed” around the
main diagonal. This is not important really, but is nice to know.
d) Interpretation
The cells in the matrix containing scatterplots you have seen before will
obviously not be interpreted. We will focus on the others. All of these
relate an accuracy measure to a fluency measure, in various
© Peter Skehan 2003
9
FDTL DATA Project: Quantitative Research Methods
combinations. Notice first that the relationship (a) shows a top-left to
bottom-right tendency, reflecting high scores on one variable being
associated with low scores on the other variable, (b) tends not to
contain many data points in the top-right and bottom-left quarters of the
scatterplot (c) is not a very strong relationship. In other words, there is
some relationship between fluency measures and accuracy measures,
but this relationship is not that strong. (For it to be strong, the data
points would need to be arranged much more tightly around an
imaginary line running top-left to bottom-right.)
The interpretation, therefore, is that higher accuracy tends to be
associated with fewer pauses, and lower accuracy with more pauses.
To put this another way, since fewer pauses means more fluency,
accuracy and fluency do, in fact, relate to one another. The more
accurate people do tend to be more fluent, with this being indexed in
fewer pauses.
We turn finally in this section to one more variation in Scattergrams that is worth
exploring. We started the section by looking at the scattergram between the fluency
measures for the decision-making and narrative tasks, and this was shown in Figures
5.1 and 5.2. The next task leads you to look at this data in a more effective way. Basic
to this is to recall that the Studies One and Two (still the data that should be selected),
in addition to comparing the decision-making and narrative tasks, also explored the
influence of planning. So this raises the possibility that the relationship between the
fluency measures for each task might be different for those participants who had
planning time and those who didn’t. The next tasks shows how the Scattergram
procedure can produce data relevant to this.
Task 5.3
Initially, in this task, consider the following possibilities:
• the relationship between decision making fluency and narrative fluency is
unaffected by whether the participant has planned or not
• the relationship between decision making fluency and narrative fluency is stronger
amongst those who planned compared to those who didn’t have planning time
• the relationship between decision making fluency and narrative fluency is weaker
for those who planned compared to those who didn’t have planning time
Before following the procedures outlined next, make a choice between the three
above possibilities.
After you have made your choice, follow this sequence:
Graphs
Scatter…
Simple
Define
Put the Decision Pausing (dpause) measure into the Y (vertical)
© Peter Skehan 2003
10
FDTL DATA Project: Quantitative Research Methods
axis and the Narrative Pausing into the X (horizontal) axis
Select Planning condition (plancond) and move it into
the Set Markers by box
Give the output a title
Click O.K.
Feedback on Task 5.3
At the outset, we need to consider colour, which is the key element in this
feedback. If you are dealing with the output on screen, in an SPSS output file,
you will have glowing colour, and this mode of working is recommended.
Similarly, if you are reading a PDF file, colour should be shown. But if you are
dealing with a printed version of these materials, produced on a black and
white printer, things will not be so attractive or clear. The moral is to work with
one of the forms of output which shows data points in colour!
Scatterplot of Decison-making and Narra
Pausing: Coded for Planning
60
50
40
30
DPAUSE
20
PLANCOND
10
2
1
0
-10
0
10
20
30
40
50
60
NPAUSE
The important thing here is to focus separately on the scatterplot for the
planners and the scatterplot for the non-planners. It is possible to do this by
looking at just one colour - green for the planners and red for the nonplanners. If you do this, a strikingly different pattern is apparent. The nonplanners, in red, show essentially a zero relationship. The red indicators,
(more to the top right of the plot, representing more pausing, and therefore
lower fluency), are evenly distributed within their own area. In other words,
how much a non-planner pauses on the decision-making task does not allow
you to predict how much that person would pause on the narrative task. In
contrast, if you attempt to “filter out” the red data points, and focus only on the
green, a quite different, and much more regular pattern emerges. In this case,
there is quite a strong bottom-left to top-right shape, indicating that where
there is opportunity to plan, there is consistency in pausing behaviour. If
© Peter Skehan 2003
11
FDTL DATA Project: Quantitative Research Methods
someone pauses a lot on the decision-making task, it is fairly predictable that
they will pause quite a lot on the narrative task. Something about planning
causes fluency, that is, to be a consistent form of behaviour. So, of the three
possibilities in mentioned in the earlier task box, the second seems to be most
satisfying one to account for this data - the relationship between the two
fluency scores is stronger when there is opportunity to plan.
Reflection
The broad issue we have been concerned with here is the representation of
relationships between two sets of measures. So far, we have explored this
representation visually, and have seen that it can be very revealing. A picture can
convey a lot of information succinctly, and it is not to difficult to look at a
scattergram and simultaneously (a) see the general shape and pattern and (b) not lose
track of the individual data points and the detail. A scattergram is therefore a very
powerful technique, and should always be used when it is appropriate. This
combination of general and particular is extremely valuable.
It is also clear that SPSS is very helpful in the different choices it provides to enable
salient aspects of relationships to emerge clearly. One aspect of this is the use of the
matrix form of data display, which enables sets of scattergrams to be examined
simultaneously. (Even so, it is difficult, sometimes, to cope with the amount of detail
involved, not least because each of the detailed scatterplots is actually quite small.
More relevant, perhaps, is the facility to Set Markers by. As we have seen in the last
task, this can enable important patterns to emerge which were previously invisible.
Use this facility whenever you can. If there are variables such as Planning, which
might lead to different sorts of performances, the Set Markers option is a key
procedure to use when exploring your data before going on to more condensed
mathematical treatments. We will be moving on next to one such mathematical
treatment - the Correlation Coefficient. The important point is that such a procedure
should always be used in conjunction with visual techniques such as Scatterplots.
© Peter Skehan 2003
12
FDTL DATA Project: Quantitative Research Methods
Correlation Coefficients
Scattergrams are excellent for giving a picture of the relationship between two sets of
scores. They should always be produced since one can often see aspects of the data in
a picture which are not accessible by any other means. But they do have the problem
that they still require some subjective interpretation - two people looking at a picture
can always come up with different interpretations, after all.
Correlation coefficients, in contrast, are the mathematical means of capturing the
degree of association between two sets of measures. The most commonly used
correlation formula, the Pearson correlation, is based on a formula that uses all the
data, and then provides one number to represent the degree of relationship involved.
The correlation figure ranges from -1.00 through 0 to +1.00. A correlation of 1.00
would represent a perfect relationship between two sets of scores, such that as one
score increased, the other would also increase. Here are some data small datasets that
would generate correlations of 1.00:
Scores A
6
8
10
12
14
16
18
Scores B
16
18
20
22
24
26
28
Scores C
4
16
18
19
21
24
85
Scores D
3
52
53
54
60
63
64
Scores A and Scores B would correlate at 1.00, as, interestingly, would Scores C and
Scores D. In each case, the higher the score on the first set, the higher the score on
the second. The increase in A and B seems particularly predictable, whereas that with
C and D is not quite so regular. Even so, the correlation in each case is 1.00.
At the other extreme, one can get scores which correlate negatively. For example, the
data:
Scores G
6
8
10
12
14
16
18
Scores H
28
26
24
20
18
16
14
would give a correlation of -1.00, showing that there is a perfect inverse relationship
between the two sets of measures - the higher the scores on G the lower the scores on
H.
© Peter Skehan 2003
13
FDTL DATA Project: Quantitative Research Methods
In the middle, you have paired sets of scores which have no relationship to one
another, in that a higher score on one measure is not associated with any particular
score on the other.
In fact, extreme scores of +1.00 or -1.00 are virtually non-existent in applied
linguistics and teacher education. Most of the time we deal with scores which are
well short of the maximum values. (There is one exception to this, which we will
soon consider.)
Task 5.4
To help you get a feel for how correlations do capture degree of relationship between
two sets of scores, look at the following problem. It shows several pairings of
measures. Most of these are drawn from the applied linguistics/ teacher education
areas. One or two, for reasons which will become apparent later, are not. You have
to do two things:
a) identify the direction of the correlation, i.e. decide whether it is positive or
negative
b) estimate the correlation between the two sets of measures in each case. To help,
you need to use the following values, using each, once, somewhere in the table:
-0.80; -0.40; -0.30; 0.00; 0.00; 0.25; 0.40; 0.40; 0.60; 0.90
level of motivation and language learning success
size of big toe and language learning ability
two tests of listening designed to be as comparable as possible
level of classroom anxiety and amount of learning
level of proficiency and length of time to complete a multiplechoice language test
length of time spent learning a language and proficiency
height and life expectancy (for men or for women, i.e. not both
at the same time)
a test of speaking and a test of academic reading
age someone starts learning a language and eventual
proficiency (age range for starting learning: 2-17)
level of motivation and language learning aptitude
© Peter Skehan 2003
14
Estimated
Correlation
……….
……….
……….
……….
……….
……….
……….
……….
……….
……….
FDTL DATA Project: Quantitative Research Methods
Feedback for Task 5.4 (Part One)
Some of the above correlational puzzles can be answered with reference to
the applied linguistics literature. Others can be answered from outside our
area. And others were simply made up (the big toe example is the only one of
these). Your problem was to estimate direction of relationship and size
(strength) of relationship. The “correct” values are given below:
level of motivation and language learning success
size of big toe and language learning ability
two tests of listening designed to be as comparable as
possible
level of classroom anxiety and amount of learning
level of proficiency and length of time to complete a
multiple-choice language test
length of time spent learning a language and proficiency
height and life expectancy for men (or women)
a test of speaking and a test of academic reading
age someone starts learning a language and eventual
proficiency (age range for starting: 2-17)
level of motivation and language learning aptitude
Estimated
Correlation
0.40
0.00
0.90
-0.30
-0.40
0.40
0.25
0.60
-0.80
0.00
The feedback which follows does make references to the applied linguistics
literature. These references are given at the end of the chapter.
These correlations range from strong negative to strong positive. We will
cover them from extreme negative to extreme positive. For the moment, we
will downplay slightly the magnitude of the correlations, concentrating instead
simply on the direction. First of all, the age someone starts learning and
eventual proficiency. This correlation is based on work reported by Johnson
and Newport (1989) and DeKeyser (2000) where these researchers
measured proficiency in a language and related it to the age people had
started learning a language. Basically, Johnson and Newport, and also
DeKeyser are arguing for a critical period for language learning, i.e. a period
early in life during which we learn languages with special facility. They argue
that this special capacity diminishes gradually, but completely so by the age of
17 or so. Hence the importance of the correlation that they found - the lower
the starting age for language learning, the higher the eventual level of
proficiency. This is powerful evidence in support of their claim. (Incidentally,
Johnson and Newport also calculated a correlation between age of initial
language learning and eventual proficiency for learners all above the age of
17. For this range, the correlation they report is close to zero - strong
evidence that after the age of 17, whatever determines language learning
success, it is no longer age.)
There are two other negative correlations. The first, -0.40 between language
proficiency and length of time to complete a language test, is, I have to
© Peter Skehan 2003
15
FDTL DATA Project: Quantitative Research Methods
confess, made up, but I feel plausible. If it were found, what it would be
saying is that those people who have higher proficiency in a language take
less time to complete a language test (assuming it is the sort of test which has
items which the test taker works through sequentially (and then leaves the
test room)). In other words, it is suggesting that those who know a language
more need less time to complete the requirement of the test. The other
negative correlation is between level of anxiety and language learning
success. It is based on work by Robert Gardner, who consistently reports
correlations at about this level. They suggest that there is a tendency for
those students who are more anxious in class to do less well, suggesting that
anxiety gets in the way of effective learning.
We next have two zero correlations, size of big toe and language learning
ability, and then level of motivation and language learning aptitude. The first
suggests that there is no relationship at all between how big someone’s big
toe is, and how well they do at language learning. I made this up, but it is
plausible, in that there is no logical connection between these two areas. In a
way, therefore, this (non) correlation is a sort of marker of when
independence between two sets of measures is likely. The other zero
correlation, between motivation and aptitude, is both surprising and empirical.
It also comes from work reported by Gardner, who administered both
motivational schedules and language aptitude tests to groups of high school
students. Essentially zero correlations resulted, suggesting that students with
higher language aptitude, despite the success they may achieve, are not any
more motivated.
Next we have a low correlation between height and life expectancy. (For
men, incidentally, this correlation prevails up to 6’5”, after which it reverses
very slightly.) So, the correlation implies that the taller you are, other things
being equal, the longer you can expect to live, but with this relationship being
a very slight one.
Task 5.5
At this point, take a moment to consider why you think there is such a correlation
between height and weight. What could cause such a relationship to appear?
(Feedback comes later).
This is a tough question, and if you do not work out the answer straightaway, keep it
in the back of your mind as your work through the next few pages. The answer might
just pop into your head!
© Peter Skehan 2003
16
FDTL DATA Project: Quantitative Research Methods
Feedback for Task 5.4: Part Two
Two slightly higher correlations are between motivation and language learning
success (again with this value coming from work conducted by Robert
Gardner), and length of time spent learning linked to language learning
achievement (based on work by John Carroll). In the first case, it appears
that those students who are more motivated are more likely to achieve more
highly in foreign language learning. The level of relationship here would be
described as moderate. So would the relationship between length of time
spent learning and language learning achievement. In this case the greater
amount of time which is spent language learning is associated with a greater
degree of success.
In some ways, the motivation and time correlations (as positive) and the
anxiety correlation (as negative) are the typical levels of correlation that we
tend to find in applied linguistics. They are clear, but also a long way away
from the perfect correlation of 1.00 (or -1.00). In thinking about this, it is
useful to consider what it would mean if the correlation between motivation
and language learning success had been as high as 0.80 or 0.90. This would,
in effect, mean that there is a very strong relationship between these two
variables. This would imply that once one knew someone’s motivation score,
one would be able to predict quite closely their language achievement score.
Or to put this another way, it would be as if motivation was the factor which
accounted for the main part of the language achievement scores. Or to put
that yet another way, it would imply that there isn’t room for any other
variables to have an impact on language learning success - motivation would
be the single and dominant factor associated with language learning
achievement.
In practice, everyone would say that accounting for language learning
success is much more complex than this, and probably dependent on a range
of factors, e.g. language learning aptitude, language learning style, the
context in which teaching occurs, the quality of instruction and materials, and
so on and so on. In other words, to expect correlations in our field much
above 0.40 is unrealistic, because they would effectively start to suggest that
accounting for language learning is much simpler than we mostly expect it to
be. We finally come to the conclusion therefore that correlations around the
levels reported above (0.40) are quite respectable and very interesting for our
field.
At this point, there are two correlations left to explore. Let’s take the larger
one first, that between two listening tests, at 0.90. A correlation of this level is
very high and indicates massive agreement between the two sets of scores.
In fact, we need to step back and consider why the two tests were developed
in the first place. It is likely that the second test was developed precisely in
order to provide an alternative measure of the very same thing as the first
test. So in that respect the high correlation is a success - it strongly suggests
that the two tests are measuring the same thing, and are doing so with great
consistency. In fact, the correlation coefficient here is being used as an index
of the reliability of the two tests, in the sense that it shows their consistency of
© Peter Skehan 2003
17
FDTL DATA Project: Quantitative Research Methods
measurement: the two tests are correlating so highly because they are
measuring the same thing, and measuring it well. It may not be quite the
same as correlating the temperature in Centigrade and Fahrenheit, but it is a
high value precisely because it reflects two windows on the same basic
attribute, rather than the measurement of two attributes, e.g. motivation and
second language learning.
This discussion makes more sense of the remaining correlation - that between
a test of speaking and a test of academic reading (at 0.60). Clearly this
relationship has an intermediate position between the correlation of the two
listening tests and the correlation between motivation and language learning
achievement. With the two tests of listening, we can claim that the two tests
are targeting precisely the same area (that is the intention, after all). With the
speaking and (academic) reading tests, in contrast, there is a difference in
area, since speaking and academic reading are hardly the same thing. On
the other hand, it may be proposed that they do both draw on common
abilities, even if the skill areas concerned are very different from one another.
In other words, speaking and reading do draw on a generalised knowledge of
the underlying language system, and so, while the differences between
speaking and reading can easily account for the correlation not being higher
than 0.60, it is likely that underlying knowledge of the target language system
will account for a considerable degree of overlap. The result is the sort of
correlation that we have obtained.
Reflection
Reflecting on this discussion, we see that correlations:
• range in numerical value
• differ in direction
The former of these, range, captures the strength of relationship involved, and, if we
relate this discussion to the previous section on scattergrams, it reflects the tightness
of the crosses made in the two dimensional space for each scattergram. A very high
positive correlation would be associated with a very clear and tightly organised
bottom left to top right pattern of crosses. The more diffuse the distribution of
crosses, in contrast, the lower the correlation. But the latter factor, direction, also
shows that relationships are not exclusively of the “higher the X, the higher the Y”
sort. It is also possible to have strong relationships of an inverse nature, as the
correlation between age of initial learning (up to 17) and eventual success attests.
The different possibilities in strength and direction simply mirror the range of
relationships that are possible in the world between two sets of scores.
Correlation and Causation
© Peter Skehan 2003
18
FDTL DATA Project: Quantitative Research Methods
The final major issue that we need to discuss in relation to correlations is that of
causation. A fundamental tenet of correlations is that they do not, in themselves,
clarify what is causing what. In principle, if two sets of scores (let’s call them A and
B) are correlated, the following possibilities exist:
• A causes B
• B causes A
• Something else, e.g. C, causes the relationship between A and B
Task 5.6: Hypothesising Causation
Here are the correlational figures from the earlier work in this chapter, this time
arranged in order positive to negative. In addition, you have an extra column in
which you should write what you consider the possible causation explanation. Do
this first by indicating whether it is A causes B, or B causes A, or C causes the A-B
relationship. Then write a few words to justify this interpretation.
Two sets of measures
two tests of listening
designed to be as
comparable as possible
a test of speaking and a test
of academic reading
length of time spent
learning a language and
proficiency
level of motivation and
language learning success
height and life expectancy
for men (or women)
level of motivation and
language learning aptitude
size of big toe and language
learning ability
level of classroom anxiety
and amount of learning
level of proficiency and
time to complete a multiplechoice language test
age someone starts learning
a language and eventual
proficiency (age range for
starting: 2-17)
Estimated
Correlation
0.90
Hypothesised Explanation
……………………………
0.60
……………………………
0.40
……………………………
0.40
……………………………
0.25
……………………………
0.00
……………………………
0.00
……………………………
-0.30
……………………………
-0.40
……………………………
-0.80
……………………………
DON’T TURN THE PAGE UNTIL YOU HAVE FINISHED THE EXERCISE
© Peter Skehan 2003
19
FDTL DATA Project: Quantitative Research Methods
Feedback on Task 5.6
In each case below, possible causation patterns are proposed. Essentially
these are based on:
• knowledge of the relevant literature
• interpretation based on theory
• speculation!
Reflect on any differences between your proposals and those which are
shown here.
Two sets of measures
Estimated
Correlation
Hypothesised Explanation
two tests of listening
designed to be as
comparable as possible
0.90
a test of speaking and a
test of academic reading
0.60
length of time spent
0.40
Remembering that testing is
a special case for
correlation, what is
happening here is that an
underlying factor (cf. C,
above), in the shape of
underlying listening ability, is
accounting for the similar
level of performance on each
test. The tests themselves
measure this underlying
factor in the same way, and
so do not interfere with the
very high correlation
emerging.
Verdict: C>A,B
As in the previous example,
an underlying ability, general
language proficiency,
functioning as a C type
variable, strongly influences
performance on each of the
tests. Other C type factors
which are likely to keep the
correlation down below 1.00
are :
· differences in underlying
abilities for speaking and
reading
· differences in the format
of the two tests which
might lead to a format
effect and lowered
correlations.
Verdict: C>A,B
This is the first relatively
© Peter Skehan 2003
20
FDTL DATA Project: Quantitative Research Methods
learning a language and
proficiency
level of motivation and
language learning
success
© Peter Skehan 2003
0.40
21
straightforward correlation.
The most plausible
interpretation here is that
more time spent learning
produces (i.e. causes) more
learning. Note though that
there must be other factors
which also have a causative
role since the correlation is
well below 1.00
Verdict: A>B
This is a particularly
interesting correlation. Here
are three interpretations:
· an A-causes-B scenario.
In other words, higher
motivation brings about a
higher level of language
learning proficiency, either
because motivated people
approach learning with
more intensity or because
they simply spend more
time learning (or, of
course, both)
· a B-causes-A scenario. In
this case, those learners
who are successful feel
more satisfaction, and as
a result are more
motivated. In other words,
higher motivation is the
consequence of greater
success, and the positive
feelings that this brings
about.
· a C-causes B and A
scenario. Ambitious
parents send their children
off to school with the
message that they should
do well. They also
provide them with
resources to help them
achieve. In addition, they
indicate to their children
that engaging with the
schooling system is
important, with the result
that their children also
FDTL DATA Project: Quantitative Research Methods
height and life
expectancy for men (or
women)
0.25
level of motivation and
language learning
aptitude
0.00
size of big toe and
language learning ability
0.00
level of classroom
anxiety and amount of
learning
-0.30
© Peter Skehan 2003
22
have educational
aspirations and
motivation.
Verdict: All causal paths are
possible!
Note: This is the feedback
for Task 5.5.
This is a “C” causes A and B
situation. In this case, the
“C” is likely to be health
quality of life at the prenatal
and early developmental
stage. It has been shown
that better health conditions
at these phases of life lead
to greater life expectancy
and, other things being
equal, people who are taller.
Verdict: C>A,B
It might be supposed that
greater aptitude would cause
greater success which would
cause higher levels of
motivation. In fact, literature
on this issue reports a zero
correlation, implying that this
chain of reasoning does not
apply.
Verdict: No causal paths
We assume here that these
two factors have absolutely
nothing to do with one
another, with the result that a
zero correlation is found, and
causation is irrelevant.
Verdict: No causal paths
Anxiety is theorised to have
complex relationships with
achievement. A small
amount of anxiety is thought
to be beneficial. But more
than a small amount is
thought to be harmful. So
we assume here that anxiety
(A) causes learners not to
attend to what they are trying
to learn as effectively as they
might, with the result that
language achievement (B)
suffers. Since it is the higher
FDTL DATA Project: Quantitative Research Methods
level of proficiency and
length of time to
complete a multiplechoice language test
-0.40
age someone starts
learning a language and
eventual proficiency (age
range for starting: 2-17)
-0.80
© Peter Skehan 2003
23
level of anxiety which
produces this lowered
achieved, it is a negative
correlation which results.
Verdict: A>B
It is assumed here that
higher proficiency (A) causes
learners who are completing
a test to know what they
know and know what they
don’t know to a greater
extent. It is assumed
therefore that while doing the
test better learners do not
need to spend so much time
thinking about possible
answers since they can
access them more readily
from their more extensive
second language knowledge.
Less proficient learners
spend more time trying to fill
in the gaps in their
knowledge, thereby taking
longer over each test item.
Verdict: A>B
It is theorised (a) that
humans are prewired for
language and (b) that the
special capacity specifically
for language learning only
remains available until a
certain threshold age is
reached. Changed
neurological conditions are
generally postulated to
underlie this constraint. If
this is accepted, then the
very strong inverse
correlation follows, in that as
people reach the end of the
critical period, they become
progressively less able to
learn languages in any sort
of special way, instead
having to learn them as they
would any other sort of
cognitive area. They
consequently do not reach
the same levels of
FDTL DATA Project: Quantitative Research Methods
achievement since cognitive
mechanisms are less
powerful than specialist
ones.
Verdict: A>B, where A is not
simply age, but the
maturational changes which
accompany age.
Read Howell, pp231-241, if you want to develop the discussion in this section.
Read Miller et al (2002), pp 155-165 for consolidation and preparation for the next
section.
Continuation of the Main Story
Now that you have looked at issues connected with correlation in general, we can
return to the dataset we have been working with. Recreate, for Studies One and Two
only, the scattergrams (the visual representations of relationships) between:
• decision and narrative pausing
• decision and narrative accuracy
• decision and narrative complexity
Note that this simply re-does an earlier Task from the beginning of this chapter, and if
you saved the scattergrams from the task in question, you could simply return to this
saved output.
(Remember that these are based on Studies One and Two only.) The first two of these
do show a positive relationship between the sets of scores concerned. The third does
not particularly.
Task 5.7
Part One
Remembering the discussion above of the different levels of correlation in Task 5.4,
take a moment and choose from the three possibilities offered in each case what you
think the correlations are in these three cases. Remember that correlations range from
–1 through 0 to +1.
Variables concerned
decision and narrative pausing
decision and narrative
accuracy
decision and narrative
complexity
© Peter Skehan 2003
Estimated relationship
a) 0.66 b) 0.56 c) 0.46
a) 0.55 b) 0.48 c) 0.41
a) 0.24 b) 0.18 c) 0.12
24
FDTL DATA Project: Quantitative Research Methods
Part Two
Now use SPSS to calculate these correlations. You should still have selected cases
from Studies One and Two only, using the Data, Select Cases sequence. Then go to
the Analyse menu, and choose Correlate and then Bivariate. Then simply choose the
two variables that you want, and click O.K. This gives you the first of the correlations.
Repeat all of these steps to give you the three correlations that you have been asked to
compute.
(Note, in passing, that there are some other boxes on the Correlation screen. Simply
accept the choices which have been made, such as Pearson’s as the correlation type,
and two tailed significance as the type of significance to be used. Pearson’s is the
appropriate correlation here, and we will return to the issue of significance below.)
© Peter Skehan 2003
25
FDTL DATA Project: Quantitative Research Methods
Feedback on Task 5.7
First of all, the output you should have generated should be like the following
table, which is for the correlation between decision making and narrative
pausing:
Correlations
Pearson
Correlation
Sig.
(2-tailed)
N
DPAUSE
NPAUSE
1.000
.661**
.661**
1.000
.
.000
.000
.
71
66
66
67
DPAUSE
NPAUSE
DPAUSE
NPAUSE
DPAUSE
NPAUSE
**. Correlation is significant at the 0.01 level
(2-tailed).
There are three important rows in this output. The first gives the actual
correlation, (twice!) and so this is, far and away, the most important row of the
table. As with the matrix of scattergrams, the lower-left section is always a
mirror image of the upper-right, so get used to seeing correlations presented
twice in this way.
The next row focusses on the significance of this correlation. We will return to
significance in much greater depth below. For now, note that (a) the
correlation of 0.661 has the superscript of two asterisks (**), meaning that this
is a significant correlation, and (b) the significance row itself gives the value
“.000”, indicating a high level of significance. This, too, is something we will
return to.
Finally, the third row simply records the number of people who entered into
each correlation. This will vary, correlation to correlation, since there may be a
few different missing values in each case.
To recap here, notice that part of the skill in interpreting output is knowing
what to ignore, or not attend to in any detail, and what, correspondingly,
should take up most of your attention. In the present case, it is the first line
which is most important.
Next we turn to the actual correlations which were found. These are as
follows:
Variables concerned
decision and narrative pausing
decision and narrative accuracy
decision and narrative complexity
Correlation
+0.66**
+0.48**
+0.24
Clearly there is a strong degree of agreement for pausing. The relationship
© Peter Skehan 2003
26
FDTL DATA Project: Quantitative Research Methods
between these two measures suggests that someone who pauses often on
one task is likely to pause often on another. There is also a fairly strong
relationship between the two accuracy scores, at 0.48. This is a weaker
relationship than that for pausing, but it does suggest that there is a tendency
for those who are more accurate on one task to be accurate on the other.
Note that each of these correlations is statistically significant. Finally, the
lowest (and non-significant) correlation, although still positive, is between the
two complexity measures. The weaker relationship here implies that
complexity is, to some extent, dependent on other things than the task style of
the individual. At the moment we do not know what this is, but it seems as if
knowing that someone produces complex language on one task doesn’t allow
us to predict with confidence that they will produce complex language on
another task.
Part One of the task you were given, to estimate the correlation on the basis
of the scattergram, was difficult, but it is useful to develop an ability to look at
a scattergram and make some sort of estimate of the correlation in question. If
you got this wrong, don’t worry. It is something that only comes with practice.
The more important part of the task, Part Two, was then to actually compute
the correlations. That was the more important part, since being able to obtain
this precise mathematical measure of relationship is the key skill here.
Statistical Significance: A First Glimpse
This, the last section of the chapter, is a tough read, which does not
really fit into the flow of this chapter. But it does pick up on material
we have just covered. It introduces a topic which will be fundamental
in the next chapter - the notion of statistical significance. If you don’t
want to engage with these ideas here, you can leave out this section
on first reading, (i.e. finishing the chapter at this point), but plan on
returning to it later, perhaps after you have completed Chapter Six.
As we have just seen, there is more material in the correlation output than simply the
correlations themselves, including the significance level for each correlation and the
number of cases. This latter is fairly self-evident - it reflects the number of subjects
whose data underlies the correlation itself. But the significance value is a little more
complex. We will, in fact, be looking at significance in much greater detail in
Chapter Six. For now, a minimal explanation should suffice, and capitalises upon the
output data we have just explored.
Step back for a moment from the correlational values that you have found. They are
specific correlations based on one particular set of data. But it is possible to imagine
many other correlations computed between the same two measures (the two accuracy
scores, say) with other groups of subjects and even with the same subjects using the
same tasks, but on different occasions. If we did such a “thought” experiment,
© Peter Skehan 2003
27
FDTL DATA Project: Quantitative Research Methods
imagining that the same subjects were required to redo the task and be measured in
the same way, we would obviously get lots of correlations, once each for each
“experiment” in our thought “experiment”. Now these correlations would not all be
the same. Some would clearly be higher than others, and some lower. There would
be all sorts of chance factors which influenced people’s actual scores on the two
measures, e.g. fatigue, inattention. To turn this around a little, when we do an actual
study we only have one correlation, but we have to think of this correlation as being
part of a range of possible correlations from a range of possible studies, with some of
this range of hypothetical correlations being higher than others.
Now we can complexify things for one final step. Imagine that we somehow know
that there is no relationship between two variables. (Don’t ask how: just imagine that
we do!) Let’s imagine that nonetheless we collected data on each of the variables,
and correlated this data. We would get a correlation coefficient, and it is highly likely
that while this correlation would be close to zero, it wouldn’t be exactly zero. Let’s
imagine that we gathered another set of actual measurements, and that we correlated
these. The result this time might well be slightly different from zero and different
from the first correlation that we obtained. Now imagine that we collected lots of
samples of data and so computed lots of correlation coefficient. The important thing
here is that these correlations would vary in size (and direction). It is possible that
through chance factors alone, some of these correlations would be, while not far from
zero, not exactly in the range 0.00 to 0.06 (say). In other words, it would be possible
that we would find particular correlations which might look interesting, (e.g. 0.20),
but which in fact would be solely due to chance factors in the sampling we had done.
Human behaviour is variable, and we couldn’t guard against this.
Now the key point is that when, in the real world, we compute only one correlation,
we have to distinguish between the situation where this may look interesting but only
in reality be the result of chance factors, and the situation when the correlation does
reflect a genuine sort of relationship. This brings us to a key point. In the SPSS
printout, what we are given information about is the likelihood that a particular
correlation could have arisen purely by chance, assuming no real relationship between
the two measures concerned. In this way, SPSS is telling us that while a correlation
may look interesting, we have to avoid taking it too seriously if it is within the range
that could have arisen by chance. So, when examining a table of correlations, the first
hurdle we have to overcome is to have some confidence that a particular reported
correlation is not there simply by chance. We need to establish whether there is a real
relationship to be taken seriously.
But how do we decide? The output table from SPSS, in its second “block” or “row”
(as shown in the beginning of the feedback section to Task 5.7) shows the likelihood
that any correlation could have arisen by chance. Here are the three correlations that
were computed as part of Task 5.7, together with the probability that they could have
occured by chance (with this taken from the SPSS output):
Variables concerned
© Peter Skehan 2003
Correlation
28
Probability of
FDTL DATA Project: Quantitative Research Methods
decision and narrative pausing
decision and narrative accuracy
decision and narrative complexity
+0.66
+0.48
+0.24
Occurrence by Chance
.000
.000
.058
The first two correlations (for pausing and for accuracy) are higher correlations and
give figures (slightly misleadingly because of a quirk of SPSS printout conventions)
of .000. Accept, for now, that it would have been more helpful, and equally
meaningful, if they had given the figure .001. This would have meant that a
correlation as large as the one that was found (e.g. 0.66) would only have arisen
purely by chance (assuming no underlying relationship) 1 time in 1000 (i.e. put a “1”
in front of the three zeros to make “1000”). If the figure had been .01, then the
probability of chance occurrence would have been 1 in 100. Somewhat perversely,
SPSS (a) treats three significant figures as being the “extreme” of its probability
(well, improbability!) scale, and (b) registers that the two correlations we are so far
looking at are even more improbable than this. Accordingly, it reports a figure of
.000, meaning that the degree of improbability of finding this correlation by chance is
even more remote than the limit that its scale allows. Or, to put it another way, very,
very, very improbable!
The remaining correlation, between the two complexity figures, is interesting and
tantalising in equal measure. Let’s step back a moment, before making judgement
about it. (We will return to this judgement making on the next page.) SPSS is, in
effect, using a scale of probability-improbability, and expressing the likelihood of any
particular correlation in terms of this scale. The scale runs from complete probability,
at one end, to infinitely improbable, at the other. The table below simply gives a
number of steps along this scale.
ProbabilityImprobability
.50
.20
.10
.05
.02
.01
.005
.002
.001
.0001
Alternative
Expression
One in two
One in five
One in ten
One in twenty
One in fifty
One in a hundred
One in two hundred
One in five hundred
One in a thousand
One in ten thousand
You can imagine other steps along this scale, such as .75 (three in four), or .025 (one
in forty).
The major point here is that there is no pre-ordained place in this sequence which
separates probable from improbable. Instead what we have really is a continuum of
improbability. Think, for a moment, about the .05, or one-in-twenty point. Imagine
that we get a correlation that SPSS calculates would occur by chance only one time in
twenty. Should we take this seriously? Should we say: if this could only occur by
© Peter Skehan 2003
29
FDTL DATA Project: Quantitative Research Methods
chance one time in twenty, it is unlikely that chance would account for this finding,
and so we should conclude there is a real relationship? Or should we say that one
chance in twenty is not that far-fetched, and that it would be better to conclude that
chance couldn’t be ruled out?
Let’s take the extremes (since these are fairly easy). If we get a correlation which
could only occur by chance one time in ten thousand, we may fairly conclude that it is
highly unlikely that such a freak occurrence accounts for the result we might so
obtain. Instead, it seems straightforward to conclude that the result reflects a real
relationship, and we can designate the correlation as statistically significant. In
contrast, if we get a correlation which SPSS tells us could occur by chance one time
in two, we can conclude that chance could easily come into play in this case, and so
we cannot rule out its effects. In other words, we wouldn’t dare take this correlation
as statistically significant, and would not want to rule out a chance explanation.
Task 5.8
Choosing the points in the above probability-improbability scale which are important
is not obvious. The basic task is to pick points which allow effective guesses that a
result didn’t occur by chance. If you choose one in two (wrongly!) this would imply
you are too generous in accepting a significant thing has happened when chance alone
could explain everything. Of course, you could go to the other extreme (wrongly
again!) and choose the “one in ten thousand” criterion. But then a lot of the time
correlations which meant a real relationship would be rejected as not being
improbable enough. The task is to choice the level of chance which works best most
of the time.
In that respect, you get three attempts! Pick three points in the above scale which you
think are particularly useful in separating probable from improbable results.
Bear in mind that the basis you have for making this judgement is very slim. But you
will see that there is a consensus amongst social scientists about the three particularly
important points, and it is useful, for an exercise only, to see if you can second guess
such people!
DO NOT READ ON UNTIL YOU HAVE DONE THE EXERCISE!!!!!
© Peter Skehan 2003
30
FDTL DATA Project: Quantitative Research Methods
Feedback on Task 5.8
By convention, the social sciences generally take three points as having
special significance. These are .05 (one in twenty), .01 (one in a hundred),
and .001 (one in a thousand). The choice of these particular points may be
arbitrary, (why not one in twenty five, for example?) but once chosen these
points become the standard by which we judge our results.
Note again that underlying this approach is the assumption that we cannot
simply look at a set of results and say definitely “That result is important”.
What we have to do is couch our assessment in terms of probability, and then
we have to adhere to the decision points in this improbability scale that have
emerged through the consensus of social scientists. We will return to this in
Chapter Six, where we will explore the connection between this method of
thinking statistically, and a Popperian approach to falsification in science.
Now we can return to the information we were looking at earlier. We can now see
that the correlations between decision pausing and narrative pausing (0.66) and
between decision accuracy and narrative accuracy (0.48) comfortably meet our .001
criterion. We can conclude, therefore, (using the “official” statistical terminology)
that these correlations are significant, i.e. that we accept that the degree of
improbability is so great that they didn’t occur by chance. (The concept of
significance is one we return to in the next chapter, since it is fundamental to all
serious statistics.)
The correlation between decision complexity and narrative complexity (0.235) is
more problematic. The associated (im)probability is .058. Now with both of these
figures it is appropriate to round to two decimal figures, giving 0.24 and .06.
(Measures in the social sciences, in general, are not so precise that they warrant more
than two significant figures. Mostly, if you see someone quoting three significant
figures (or more), it is a strong indicator that they don’t know what they are doing!)
Earlier we said that the most lenient level of improbability that we will accept that
allows the claim of a significant result is .05 (one in twenty). Quite clearly, .06 is
larger (i.e. more probable) than this, at something like one in sixteen or one in
seventeen, so we have missed the critical value. Only by a hairsbreadth, but we have
missed nonetheless. The consequence is that we cannot claim that this correlation is
significant. (You have to say to yourself here that some sort of decision-making
standard is necessary, and that if such a standard is to work, we have to adhere to it.
Even when, as here, the results are very close.) It follows that we have to accept that
this correlation could have arisen by chance. What this means is that while for
pausing and for accuracy we are entitled to start discussing and interpreting the
relationships involved, we are not entitled to do this for the complexity correlation.
Hard lines, maybe, but it is the only course of action available!
© Peter Skehan 2003
31
FDTL DATA Project: Quantitative Research Methods
One final note on correlations and significance. As with most statistical procedures,
underlying the decision-making about significance is a formula which allows us to
calculate the size of correlation that would be significant for different sample sizes.
We are not going to look at this formula here, but it is relevant to say that the formula
is dependent on two things:
• the strength of the relationship involved
• the number of cases the correlation is based on
Some values of the minimum size of correlation needed to attain the 0.05 significance
level for different sample sizes are as follows:
Sample size
Significant
correlation
0.65
0.45
0.36
0.31
0.28
0.20
0.14
0.09
10
20
30
40
50
100
200
500
The higher the correlation (fairly obviously) the more likely it is to be significant.
But it is also the case that the more cases the correlation is based on, the more likely it
is to be significant, as the table above makes abundantly clear. So if we want to talk
about “maybes”, we can say that the correlation for decision-making and narrative
complexity would have been significant if, for the number of cases involved here, it
had been any higher than the figure of 0.24 (i.e. 0.25 would have done!). But it
would also have reached significance if it had been based on just two more cases, i.e.
72 instead of 70, i.e. 0.24 would have reached significance with this larger number of
cases. Obvious moral: try to gather data on as many cases as possible!!
Notice also that as the sample size becomes seriously large, attaining a significant
correlation is much easier, and so to claim significance with a sample size of (say)
500 is less impressive. Particularly clearly in this case, it is the strength of the
correlation that counts, rather than simply having attained significance. (Another issue
we will return to in Chapter Six.)
© Peter Skehan 2003
32
FDTL DATA Project: Quantitative Research Methods
Tying it all together: Generating and Interpreting Correlation
Coefficients
The remainder of this chapter is concerned with a number of tasks which are meant to
integrate what you have covered about correlation coefficients. These include:
• producing (more) correlation coefficients
• relating correlations to the notion of statistical significance
• generating blocks of correlations more easily
Task 5.9
Compute the correlations between the scores for decision-making and narrative
accuracy, on the one hand, and the scores for decision-making and narrative
complexity, on the other. Interpret the results, drawing on what you know about
statistical significance, as appropriate.
You can use either of the following methods:
Method One: Using Analyse, Correlation, Bivariate as your sequence of actions,
choose and move to the right-hand box the Decision-making complexity score and
then the Decision-making accuracy score. Record the result, and then repeat with the
corresponding actions to obtain the three other correlations you have been asked to
obtain. Finally, with your table of four correlations, make your interpretations.
Method Two: As above, but move all four measures across to the right hand box at
the same time. Then, from the larger output table that you produce, identify the
correlations you have been asked to obtain, and ignore the result. Interpret the specific
correlations involved in the task only.
Feedback on Task 5.9
© Peter Skehan 2003
33
FDTL DATA Project: Quantitative Research Methods
The correlations are as follows:
Decision Accuracy
Decision Complexity
Narrative Complexity
.08
.10
Narrative
Accuracy
.26*
.02
Although one of these correlations reaches the one-in-twenty (.05) level of
significance, it only just reaches this level, and the other three correlations do
not reach significance at all. We can conclude, therefore, that performance
on these tasks in terms of accuracy is largely independent of (i.e. does not
correlate with) the complexity dimension of performance. To put this another
way, it appears that knowing that someone produces accurate language does
not enable us to predict that they will use more complex language - they might
but they might not.
Reflection
Although the advantages and disadvantages may have been apparent in earlier tasks,
it is worth giving a little attention to the two methods you could have used to obtain
the results. The first is more laborious but more selective, and only generates the
specific correlations you want. The cost is that you have to use four “runs” to get
these results. The alternative method, choosing all measures that are involved,
generates more correlations than you need, but it has the virtue that only one “run”
is needed. The disadvantage is that you need to know how to “read” the larger table
of correlations that is produced, and to “screen out” those that don’t apply to the
question you are pursuing.
In practice, of course, for reasons of less effort, most people lean towards the second
method, and the need to learn how to ignore effectively! As you have already seen,
SPSS errs on the side of easily giving lots of output. Not all of it needs to be focussed
on in great detail, and the skill to work with output quickly by focussing on what is
important s a good one to learn.
Task 5.10
Do the same for the relationship between the various accuracy and fluency scores.
Then do the same for the relationship between the different complexity and fluency
and scores.
It would be worth trying here to generate the large correlation matrix, and then extract
for attention those correlations which are specifically required by the above
instruction. You might then try to assemble the specific correlations in two separate
Word tables, one for the accuracy-fluency connection, and one for the complexityfluency connection.
© Peter Skehan 2003
34
FDTL DATA Project: Quantitative Research Methods
Feedback on Task 5.10
You should obtain the following results:
Decision Fluency
Narrative Fluency
Decision Accuracy
-0.31*
-0.22
Narrative Accuracy
-0.38**
-0.36**
Decision Fluency
Narrative Fluency
Decision Complexity
-0.54**
-0.39**
Narrative Complexity
-0.31*
-0.23
These figures suggest that there is a negative relationship between
complexity and accuracy, on the one hand, and the number of pauses, on the
other. In other words, the higher the accuracy (or complexity), the fewer the
pauses. In fact, although the relationship between the measures is an inverse
one, it is easier to think of the relationship between the “constructs” as
positive. Fewer pauses in fact means higher fluency, i.e. absence of
disfluency. So the relationship is essentially a positive one - the more
complex and accurate the language, the more fluent the performance. Here,
at least, it looks as if those who can achieve either accuracy of complexity
(remember that these are independent of one another) tend to be more fluent.
In other words, they do not achieve their accuracy or complexity at the
expense of fluency.
Reflection
There is one last bit of reflection with correlations. The last task has shown how (a) it
is easiest to generate whole swathes of correlations at one go, and (b) it makes more
sense to work one’s way through such blocks of correlations in a systematic manner.
SPSS is just as happy to correlate large numbers of correlations as it is to do just one.
Obviously this is often very convenient. But don’t be too seduced by this - there is no
substitute for carefully and selectively working your way through a whole matrix of
correlations where you work is motivated by principle, not simply trawling through
the data for something interesting. By all means, in other words, exploit SPSS’s
power. But be sensible as well, and try to follow predictions and principle, rather
than aimless explorations.
© Peter Skehan 2003
35