FDTL DATA Project: Quantitative Research Methods Chapter Five Examining Relationships between Variables Aims of the Chapter In this chapter you are introduced to new statistical techniques. The chapter aims to: • explore the nature of relationships between two sets of measures • demonstrate how the relationship between two sets of measures (e.g. height and weight) can be represented visually • show how such relationships can also, in parallel, be measured by a more mathematical method, the correlation coefficient • discuss the meaning and interpretation of correlation coefficients, and relate this measure to the nature of causality The chapter is a development of the dataset introduced in Chapter Four, and also presupposes data handling techniques, e.g. selecting cases, introduced in that that chapter. If you haven’t read Chapter Four, it may be necessary to do so now, before you work through the materials in this chapter. Introduction In most of the previous statistical procedures, we have looked at variables one at a time. For example, in Chapter Four, we looked at the mean score for complexity on the decision making task, or the mean score for fluency on the narrative, on so on. Very often, though, we want to look at the relationship between two sets of scores. In such cases: • each person (case) has a score (or measure) on each of the two variables • we want to know if there is a tendency for scores on one variable to relate to scores on the other variable, e.g. • for high scores on one to be associated with high scores on the other • for low scores to be associated with low scores • for average scores to be associated with average scores Consider, in this respect, scores for pausing, for the joint dataset of Studies One and Two, for both the decision making tasks and the narrative tasks. We can rephrase the “relationships” questions for this specific case as: • do frequent pausers on the decision making tasks also tend to be frequent pausers on the narrative tasks • do infrequent pausers on the decision making tasks also tend to be infrequent pausers on the narrative tasks • do average pausers on the decision making tasks also tend to be average pausers on the narrative tasks © Peter Skehan 2003 1 FDTL DATA Project: Quantitative Research Methods More broadly, we are asking here whether there is a characteristic pausing pattern which prevails over the two tasks. (Notice that we are ignoring other factors here, such as planning and post-task conditions. In other words, we are assuming that consistency in pausing overrides any effect of these other conditions.) If there is a characteristic pausing pattern, it suggests that it is the individual who determines how much pausing takes place (for reasons of conversational style, or previous teaching, or whatever: we don’t know what the actual cause might be), rather than the specific task leading to characteristic pausing patterns. We can address this issue in two ways. We can look at a visual representation of the relationship involved. Alternatively we can look at a mathematical version of the strength of the relationship between the two sets of scores. Scattergrams We will start with a visual representation, based on Display1.sav, shown below as Figure 5.1. Figure 5.1 Scattergram of Decision-making and Narrative Pausing 60 50 Score of about 27 for decision making and about 32 for narrative 40 30 DPAUSE 20 10 0 -10 0 10 20 30 40 50 60 NPAUSE The vertical axis shows the scores for pauses on the decision-making tasks, and the horizontal axis shows the scores for pausing on the narrative tasks. So, as you go up the vertical axis, we have the possible measures which could occur for the number of pauses on the decision-making tasks, ranging from zero to a maximum possible value, given here as 60. Any individual’s decision-making pausing score can be located somewhere on this scale. Exactly the same should then apply to the horizontal axis, which shows a range of possible scores from zero to sixty also. Well, actually, it © Peter Skehan 2003 2 FDTL DATA Project: Quantitative Research Methods shows -10 to 60. SPSS has adjusted the scale to start at the impossible value of -10 because one person got a score of zero. It then “makes space” to the left of this, and rather stupidly, pretends that a minus score is legitimate, which is nonsense. No actual minus scores do occur, so ignore this piece of automatic adjustment, and focus on the part of the diagram between 0 and 60. Given these two axes, it is possible to think of the pair of scores an individual might have, one located for the decision-making pauses on the vertical axis, and one for the narrative pauses, on the horizontal axis. Indeed, this allows us to imagine drawing two lines: one horizontal line from the point on the vertical axis corresponding to an individual’s score for decision-making pauses, and a vertical line, going upwards, from the point on the horizontal axis corresponding to an individual’s score for narrative pausing. The two drawn lines would then intersect at a point which would “capture” where we could locate an individual in the two-dimensional figure. The arrowed score shows such a particular case - someone who got a pausing score of about 27 for the decision making task and about 32 for the narrative. In other words, each person can be “defined” as a point in this two dimensional space, representing the intersection of the pausing score on the decision making task and the pausing score on the narrative. The first thing to do is to decide what pattern, if any, is shown by the scores. My interpretation of the shape in Figure 5.1, would be of a very badly drawn very rough kite shape (as shown below). This suggests that low scores for decision making pausing tend to be associated with low scores for narrative pausing. Then the relationship weakens a little although there is still a slight tendency for high scores to be associated with high scores. In other words, knowing that someone got a low score on the decision-making tasks is a good predictor that they got a low score on the narrative tasks, but knowing that someone got a high score on the decision-making tasks doesn’t provide such a clear basis for predicting that someone got a high pausing score for the narrative tasks. There is a tendency, but it is nothing like to close. As another aspect of capturing this relationship, it is striking that there are relatively few scores in the top left or bottom right areas of the diagram which is shown. In other words, you don’t tend to get people who have a low pausing score on the decision making tasks who have high pausing scores on the narrative tasks. Or vice versa - high decision-making pausing scores linked to low narrative pausing scores. The conclusion to draw here seems to be that visually, there is a relationship between the two sets of scores, and that the diagram used, which is called a scattergram, helps bring this out clearly. © Peter Skehan 2003 3 FDTL DATA Project: Quantitative Research Methods Figure 5.2 Scattergram of Decision-making and Narrative Pausing 60 50 40 30 DPAUSE 20 10 0 -10 0 10 20 30 40 50 60 NPAUSE Task 5.1 Using the file Display1.sav, select cases from the dataset from Studies One and Two only, and then create scattergrams which show the relationship between: • decision making complexity and narrative complexity • decision making accuracy and narrative accuracy Part One To select cases from Studies One and Two only, follow the sequence: Data Select Cases Click the radio button for If condition is satisfied Click the If button which becomes undimmed Click on Study in the left-hand box and move it into the right hand box Click successively on “<” and “3” on the calculator, and move them across to the right hand box Click Continue Click OK Part Two © Peter Skehan 2003 4 FDTL DATA Project: Quantitative Research Methods Now that you are only dealing with cases from Studies One and Two, you can produce the actual scattergrams. To do this, follow these steps: • select the Graphs menu • select Scatter • from the resulting new screen, choose Simple • choose a variable to go into the (horizontal ) X axis, e.g. decision making complexity • choose a variable to go into the (vertical) Y axis, e.g. narrative complexity • click o.k. • after you have generated relevant output for the first scattergram, e.g. complexity, then repeat this set of procedures to produce the scattergram for accuracy. As a focussing question, consider the claim that one of the two scattergrams produced for the exercise shows a relationship between the two variables concerned and the other does not. Which is which? © Peter Skehan 2003 5 FDTL DATA Project: Quantitative Research Methods Feedback on Task 5.1 We will look at the scattergram for accuracy first, and then for complexity. Scattergram of Decision-making and Narrative Accuracy .9 .8 .7 DACCURAC .6 .5 .4 .3 .4 .5 .6 .7 .8 .9 NACCURAC The scattergram for accuracy does take a little bit more interpreting. My claim would be that a relationship can be seen here, in that there is a trend for the data points to lie in a bottom-left to top-right shape. There are some data points which are top-left and some which are bottom-right, but there are not many of these, and they are somewhat distributed in this space, without any “density”. In contrast, the general drift, so to speak, suggests that there is greater density of the data points (remembering that each data point represents a person’s scores on the two measures concerned) in this main pattern of bottom-left to top-right. This indicates a positive relationship, in that there is a tendency for someone who is accurate on the decision-making task to be accurate on the narrative task. There is scope, though, for the relationship to be much stronger. This would be shown if the distibution of scores had been “tighter” around an imaginary line running bottom-left to topright, with fewer scores drifting off such a line. As it is, we can guess that the relationship in question is clearly detectable, but moderate in nature. © Peter Skehan 2003 6 FDTL DATA Project: Quantitative Research Methods Scattergram of Decision-making and Narrative Complexity 2.2 2.0 1.8 1.6 NCOMPLEX 1.4 1.2 1.0 .8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 DCOMPLEX The relationship for the complexity scores is somewhat different. In fact, it is better here to look at the scores for the two tasks separately before attempting to relate them. The major contrast, in this respect, is between the wide range of complexity scores for the decision-making task and relatively narrower range for the narrative task. The decision-making scores “occupy space” fairly (but not totally) evenly between just over 1.0 and something like 2.3. In contrast, there is a band of complexity scores between 1.0 and around 1.45 which accounts for most scores on this task. Then there are some in the range 1.45 to 2.2, but there are, in truth, not very many of these. In other words, the variation in complexity scores contrasts the two tasks in itself. The decision-making task seems to have provoked much more variation than the narrative task. If, next, we turn to exploring what the visual pattern says about the relationship between these two sets of scores, it basically looks as if there is hardly any. There are a lot of scores in the bottom-left section of the scattergram, and then some in the top-right. But there are also some scores in the top-left and some in the bottom-right. So we cannot use what is in the diagram very effectively to predict someone’s complexity score on one task on the basis of knowing their score on the other. To generalise just a little here, it appears that there is some consistency of accuracy in performance across the two tasks, but there is little consistency in complexity. Accuracy, in other words, functions in a manner similar to fluency, while complexity seems to be influenced by other things. We will return to these relationships when we look at correlation coefficient. © Peter Skehan 2003 7 FDTL DATA Project: Quantitative Research Methods So far, you have used the Simple option with Scattergram, and used it as straightforwardly as it can be used. But there is an interesting option within the Simple procedure, and there are also other Scattergram types, one of which we will now explore. In the previous task, for each use of the Scattergrams procedure, you generated a graph showing the visual relationship between pairs of measures. You did this twice, for accuracy and complexity, just as the materials, earlier, had done it once, for pausing. Scattergrams, (and Correlation Coefficients later in this chapter) always work in this way, focussing on pairs of measures. But it is possible to display more than one pairwise relationship on screen at the same time, and there are occasions when it is useful to do this. The next task requires you to explore this. Task 5.2 In the previous task, you saw that there is visual evidence of a relationship between the two fluency measures and the two accuracy measures, taken separately. It might be interesting to see how all four of these measures relate to one another, as the whole set of pairs of measures: • decision making fluency with narrative fluency • decision making accuracy with narrative accuracy • decision making fluency with decision making accuracy • narrative fluency with narrative accuracy • decision making fluency with narrative accuracy • decision making accuracy with narrative fluency To explore this, repeat the above instructions, by choosing Graphs and then Scatter… But then: Select Matrix Click Define Choose both pausing measures and both accuracy measures, one by one, moving each in turn from the left-hand box to the righthand box, labelled Matrix Variables Give the chart an appropriate title Click OK Focussing Questions a) Identify the scattergrams in the output diagram that you have seen before, from previous work in this chapter. b) Identify the scattergrams in the output diagram that you haven’t seen before. c) Identify the scattergrams in the output diagram that are simply mirror images of themselves! d) Interpret the scattergrams you haven’t seen before. Try and compare the relationships they represent with the relationships shown by the scattergrams you have seen before. © Peter Skehan 2003 8 FDTL DATA Project: Quantitative Research Methods Feedback on Task 5.2 The first impression here, obviously, is how much data is involved, and how, as a result, it can be overwhelming. For this very reason, you won’t use the Matrix facility very much, but it can be useful occasionally to see general trends. It is nice to know it is there, in other words. Accuracy-Fluency Scatterplots DACCURAC NACCURAC DPAUSE NPAUSE We will go through the questions one-by-one. But first, it is useful to have a system of referring to the individual scattergrams within the matrix. We will simply talk about, e.g. Row1:Col1, in this case referring to the top left cell, i.e. the text cell which says “DACCURAC”. a) Scattergrams you have seen before: These are Row1:Col2 (and Row2:Col1) and then Row3:Col4 (and Row4:Col3) b) Scattergrams you haven’t seen before: All the other actual scattergrams, i.e. the block of four in the top-right section (and in the bottom left section). c) Mirror images The diagonal elements in the matrix of scattergams are the variable names, and they divide the arrangement of cells into upper right and lower left triangles. The lower left triangle is simply a reflection of the upper right triangle, and is, basically, the same information. It can be ignored. This is a quirk of SPSS output. Actually, though, there is one difference. If you look closely at a “mirror” pair of scatterplots, you will see that the one in the lower left is “reflected”, or “reversed” around the main diagonal. This is not important really, but is nice to know. d) Interpretation The cells in the matrix containing scatterplots you have seen before will obviously not be interpreted. We will focus on the others. All of these relate an accuracy measure to a fluency measure, in various © Peter Skehan 2003 9 FDTL DATA Project: Quantitative Research Methods combinations. Notice first that the relationship (a) shows a top-left to bottom-right tendency, reflecting high scores on one variable being associated with low scores on the other variable, (b) tends not to contain many data points in the top-right and bottom-left quarters of the scatterplot (c) is not a very strong relationship. In other words, there is some relationship between fluency measures and accuracy measures, but this relationship is not that strong. (For it to be strong, the data points would need to be arranged much more tightly around an imaginary line running top-left to bottom-right.) The interpretation, therefore, is that higher accuracy tends to be associated with fewer pauses, and lower accuracy with more pauses. To put this another way, since fewer pauses means more fluency, accuracy and fluency do, in fact, relate to one another. The more accurate people do tend to be more fluent, with this being indexed in fewer pauses. We turn finally in this section to one more variation in Scattergrams that is worth exploring. We started the section by looking at the scattergram between the fluency measures for the decision-making and narrative tasks, and this was shown in Figures 5.1 and 5.2. The next task leads you to look at this data in a more effective way. Basic to this is to recall that the Studies One and Two (still the data that should be selected), in addition to comparing the decision-making and narrative tasks, also explored the influence of planning. So this raises the possibility that the relationship between the fluency measures for each task might be different for those participants who had planning time and those who didn’t. The next tasks shows how the Scattergram procedure can produce data relevant to this. Task 5.3 Initially, in this task, consider the following possibilities: • the relationship between decision making fluency and narrative fluency is unaffected by whether the participant has planned or not • the relationship between decision making fluency and narrative fluency is stronger amongst those who planned compared to those who didn’t have planning time • the relationship between decision making fluency and narrative fluency is weaker for those who planned compared to those who didn’t have planning time Before following the procedures outlined next, make a choice between the three above possibilities. After you have made your choice, follow this sequence: Graphs Scatter… Simple Define Put the Decision Pausing (dpause) measure into the Y (vertical) © Peter Skehan 2003 10 FDTL DATA Project: Quantitative Research Methods axis and the Narrative Pausing into the X (horizontal) axis Select Planning condition (plancond) and move it into the Set Markers by box Give the output a title Click O.K. Feedback on Task 5.3 At the outset, we need to consider colour, which is the key element in this feedback. If you are dealing with the output on screen, in an SPSS output file, you will have glowing colour, and this mode of working is recommended. Similarly, if you are reading a PDF file, colour should be shown. But if you are dealing with a printed version of these materials, produced on a black and white printer, things will not be so attractive or clear. The moral is to work with one of the forms of output which shows data points in colour! Scatterplot of Decison-making and Narra Pausing: Coded for Planning 60 50 40 30 DPAUSE 20 PLANCOND 10 2 1 0 -10 0 10 20 30 40 50 60 NPAUSE The important thing here is to focus separately on the scatterplot for the planners and the scatterplot for the non-planners. It is possible to do this by looking at just one colour - green for the planners and red for the nonplanners. If you do this, a strikingly different pattern is apparent. The nonplanners, in red, show essentially a zero relationship. The red indicators, (more to the top right of the plot, representing more pausing, and therefore lower fluency), are evenly distributed within their own area. In other words, how much a non-planner pauses on the decision-making task does not allow you to predict how much that person would pause on the narrative task. In contrast, if you attempt to “filter out” the red data points, and focus only on the green, a quite different, and much more regular pattern emerges. In this case, there is quite a strong bottom-left to top-right shape, indicating that where there is opportunity to plan, there is consistency in pausing behaviour. If © Peter Skehan 2003 11 FDTL DATA Project: Quantitative Research Methods someone pauses a lot on the decision-making task, it is fairly predictable that they will pause quite a lot on the narrative task. Something about planning causes fluency, that is, to be a consistent form of behaviour. So, of the three possibilities in mentioned in the earlier task box, the second seems to be most satisfying one to account for this data - the relationship between the two fluency scores is stronger when there is opportunity to plan. Reflection The broad issue we have been concerned with here is the representation of relationships between two sets of measures. So far, we have explored this representation visually, and have seen that it can be very revealing. A picture can convey a lot of information succinctly, and it is not to difficult to look at a scattergram and simultaneously (a) see the general shape and pattern and (b) not lose track of the individual data points and the detail. A scattergram is therefore a very powerful technique, and should always be used when it is appropriate. This combination of general and particular is extremely valuable. It is also clear that SPSS is very helpful in the different choices it provides to enable salient aspects of relationships to emerge clearly. One aspect of this is the use of the matrix form of data display, which enables sets of scattergrams to be examined simultaneously. (Even so, it is difficult, sometimes, to cope with the amount of detail involved, not least because each of the detailed scatterplots is actually quite small. More relevant, perhaps, is the facility to Set Markers by. As we have seen in the last task, this can enable important patterns to emerge which were previously invisible. Use this facility whenever you can. If there are variables such as Planning, which might lead to different sorts of performances, the Set Markers option is a key procedure to use when exploring your data before going on to more condensed mathematical treatments. We will be moving on next to one such mathematical treatment - the Correlation Coefficient. The important point is that such a procedure should always be used in conjunction with visual techniques such as Scatterplots. © Peter Skehan 2003 12 FDTL DATA Project: Quantitative Research Methods Correlation Coefficients Scattergrams are excellent for giving a picture of the relationship between two sets of scores. They should always be produced since one can often see aspects of the data in a picture which are not accessible by any other means. But they do have the problem that they still require some subjective interpretation - two people looking at a picture can always come up with different interpretations, after all. Correlation coefficients, in contrast, are the mathematical means of capturing the degree of association between two sets of measures. The most commonly used correlation formula, the Pearson correlation, is based on a formula that uses all the data, and then provides one number to represent the degree of relationship involved. The correlation figure ranges from -1.00 through 0 to +1.00. A correlation of 1.00 would represent a perfect relationship between two sets of scores, such that as one score increased, the other would also increase. Here are some data small datasets that would generate correlations of 1.00: Scores A 6 8 10 12 14 16 18 Scores B 16 18 20 22 24 26 28 Scores C 4 16 18 19 21 24 85 Scores D 3 52 53 54 60 63 64 Scores A and Scores B would correlate at 1.00, as, interestingly, would Scores C and Scores D. In each case, the higher the score on the first set, the higher the score on the second. The increase in A and B seems particularly predictable, whereas that with C and D is not quite so regular. Even so, the correlation in each case is 1.00. At the other extreme, one can get scores which correlate negatively. For example, the data: Scores G 6 8 10 12 14 16 18 Scores H 28 26 24 20 18 16 14 would give a correlation of -1.00, showing that there is a perfect inverse relationship between the two sets of measures - the higher the scores on G the lower the scores on H. © Peter Skehan 2003 13 FDTL DATA Project: Quantitative Research Methods In the middle, you have paired sets of scores which have no relationship to one another, in that a higher score on one measure is not associated with any particular score on the other. In fact, extreme scores of +1.00 or -1.00 are virtually non-existent in applied linguistics and teacher education. Most of the time we deal with scores which are well short of the maximum values. (There is one exception to this, which we will soon consider.) Task 5.4 To help you get a feel for how correlations do capture degree of relationship between two sets of scores, look at the following problem. It shows several pairings of measures. Most of these are drawn from the applied linguistics/ teacher education areas. One or two, for reasons which will become apparent later, are not. You have to do two things: a) identify the direction of the correlation, i.e. decide whether it is positive or negative b) estimate the correlation between the two sets of measures in each case. To help, you need to use the following values, using each, once, somewhere in the table: -0.80; -0.40; -0.30; 0.00; 0.00; 0.25; 0.40; 0.40; 0.60; 0.90 level of motivation and language learning success size of big toe and language learning ability two tests of listening designed to be as comparable as possible level of classroom anxiety and amount of learning level of proficiency and length of time to complete a multiplechoice language test length of time spent learning a language and proficiency height and life expectancy (for men or for women, i.e. not both at the same time) a test of speaking and a test of academic reading age someone starts learning a language and eventual proficiency (age range for starting learning: 2-17) level of motivation and language learning aptitude © Peter Skehan 2003 14 Estimated Correlation ………. ………. ………. ………. ………. ………. ………. ………. ………. ………. FDTL DATA Project: Quantitative Research Methods Feedback for Task 5.4 (Part One) Some of the above correlational puzzles can be answered with reference to the applied linguistics literature. Others can be answered from outside our area. And others were simply made up (the big toe example is the only one of these). Your problem was to estimate direction of relationship and size (strength) of relationship. The “correct” values are given below: level of motivation and language learning success size of big toe and language learning ability two tests of listening designed to be as comparable as possible level of classroom anxiety and amount of learning level of proficiency and length of time to complete a multiple-choice language test length of time spent learning a language and proficiency height and life expectancy for men (or women) a test of speaking and a test of academic reading age someone starts learning a language and eventual proficiency (age range for starting: 2-17) level of motivation and language learning aptitude Estimated Correlation 0.40 0.00 0.90 -0.30 -0.40 0.40 0.25 0.60 -0.80 0.00 The feedback which follows does make references to the applied linguistics literature. These references are given at the end of the chapter. These correlations range from strong negative to strong positive. We will cover them from extreme negative to extreme positive. For the moment, we will downplay slightly the magnitude of the correlations, concentrating instead simply on the direction. First of all, the age someone starts learning and eventual proficiency. This correlation is based on work reported by Johnson and Newport (1989) and DeKeyser (2000) where these researchers measured proficiency in a language and related it to the age people had started learning a language. Basically, Johnson and Newport, and also DeKeyser are arguing for a critical period for language learning, i.e. a period early in life during which we learn languages with special facility. They argue that this special capacity diminishes gradually, but completely so by the age of 17 or so. Hence the importance of the correlation that they found - the lower the starting age for language learning, the higher the eventual level of proficiency. This is powerful evidence in support of their claim. (Incidentally, Johnson and Newport also calculated a correlation between age of initial language learning and eventual proficiency for learners all above the age of 17. For this range, the correlation they report is close to zero - strong evidence that after the age of 17, whatever determines language learning success, it is no longer age.) There are two other negative correlations. The first, -0.40 between language proficiency and length of time to complete a language test, is, I have to © Peter Skehan 2003 15 FDTL DATA Project: Quantitative Research Methods confess, made up, but I feel plausible. If it were found, what it would be saying is that those people who have higher proficiency in a language take less time to complete a language test (assuming it is the sort of test which has items which the test taker works through sequentially (and then leaves the test room)). In other words, it is suggesting that those who know a language more need less time to complete the requirement of the test. The other negative correlation is between level of anxiety and language learning success. It is based on work by Robert Gardner, who consistently reports correlations at about this level. They suggest that there is a tendency for those students who are more anxious in class to do less well, suggesting that anxiety gets in the way of effective learning. We next have two zero correlations, size of big toe and language learning ability, and then level of motivation and language learning aptitude. The first suggests that there is no relationship at all between how big someone’s big toe is, and how well they do at language learning. I made this up, but it is plausible, in that there is no logical connection between these two areas. In a way, therefore, this (non) correlation is a sort of marker of when independence between two sets of measures is likely. The other zero correlation, between motivation and aptitude, is both surprising and empirical. It also comes from work reported by Gardner, who administered both motivational schedules and language aptitude tests to groups of high school students. Essentially zero correlations resulted, suggesting that students with higher language aptitude, despite the success they may achieve, are not any more motivated. Next we have a low correlation between height and life expectancy. (For men, incidentally, this correlation prevails up to 6’5”, after which it reverses very slightly.) So, the correlation implies that the taller you are, other things being equal, the longer you can expect to live, but with this relationship being a very slight one. Task 5.5 At this point, take a moment to consider why you think there is such a correlation between height and weight. What could cause such a relationship to appear? (Feedback comes later). This is a tough question, and if you do not work out the answer straightaway, keep it in the back of your mind as your work through the next few pages. The answer might just pop into your head! © Peter Skehan 2003 16 FDTL DATA Project: Quantitative Research Methods Feedback for Task 5.4: Part Two Two slightly higher correlations are between motivation and language learning success (again with this value coming from work conducted by Robert Gardner), and length of time spent learning linked to language learning achievement (based on work by John Carroll). In the first case, it appears that those students who are more motivated are more likely to achieve more highly in foreign language learning. The level of relationship here would be described as moderate. So would the relationship between length of time spent learning and language learning achievement. In this case the greater amount of time which is spent language learning is associated with a greater degree of success. In some ways, the motivation and time correlations (as positive) and the anxiety correlation (as negative) are the typical levels of correlation that we tend to find in applied linguistics. They are clear, but also a long way away from the perfect correlation of 1.00 (or -1.00). In thinking about this, it is useful to consider what it would mean if the correlation between motivation and language learning success had been as high as 0.80 or 0.90. This would, in effect, mean that there is a very strong relationship between these two variables. This would imply that once one knew someone’s motivation score, one would be able to predict quite closely their language achievement score. Or to put this another way, it would be as if motivation was the factor which accounted for the main part of the language achievement scores. Or to put that yet another way, it would imply that there isn’t room for any other variables to have an impact on language learning success - motivation would be the single and dominant factor associated with language learning achievement. In practice, everyone would say that accounting for language learning success is much more complex than this, and probably dependent on a range of factors, e.g. language learning aptitude, language learning style, the context in which teaching occurs, the quality of instruction and materials, and so on and so on. In other words, to expect correlations in our field much above 0.40 is unrealistic, because they would effectively start to suggest that accounting for language learning is much simpler than we mostly expect it to be. We finally come to the conclusion therefore that correlations around the levels reported above (0.40) are quite respectable and very interesting for our field. At this point, there are two correlations left to explore. Let’s take the larger one first, that between two listening tests, at 0.90. A correlation of this level is very high and indicates massive agreement between the two sets of scores. In fact, we need to step back and consider why the two tests were developed in the first place. It is likely that the second test was developed precisely in order to provide an alternative measure of the very same thing as the first test. So in that respect the high correlation is a success - it strongly suggests that the two tests are measuring the same thing, and are doing so with great consistency. In fact, the correlation coefficient here is being used as an index of the reliability of the two tests, in the sense that it shows their consistency of © Peter Skehan 2003 17 FDTL DATA Project: Quantitative Research Methods measurement: the two tests are correlating so highly because they are measuring the same thing, and measuring it well. It may not be quite the same as correlating the temperature in Centigrade and Fahrenheit, but it is a high value precisely because it reflects two windows on the same basic attribute, rather than the measurement of two attributes, e.g. motivation and second language learning. This discussion makes more sense of the remaining correlation - that between a test of speaking and a test of academic reading (at 0.60). Clearly this relationship has an intermediate position between the correlation of the two listening tests and the correlation between motivation and language learning achievement. With the two tests of listening, we can claim that the two tests are targeting precisely the same area (that is the intention, after all). With the speaking and (academic) reading tests, in contrast, there is a difference in area, since speaking and academic reading are hardly the same thing. On the other hand, it may be proposed that they do both draw on common abilities, even if the skill areas concerned are very different from one another. In other words, speaking and reading do draw on a generalised knowledge of the underlying language system, and so, while the differences between speaking and reading can easily account for the correlation not being higher than 0.60, it is likely that underlying knowledge of the target language system will account for a considerable degree of overlap. The result is the sort of correlation that we have obtained. Reflection Reflecting on this discussion, we see that correlations: • range in numerical value • differ in direction The former of these, range, captures the strength of relationship involved, and, if we relate this discussion to the previous section on scattergrams, it reflects the tightness of the crosses made in the two dimensional space for each scattergram. A very high positive correlation would be associated with a very clear and tightly organised bottom left to top right pattern of crosses. The more diffuse the distribution of crosses, in contrast, the lower the correlation. But the latter factor, direction, also shows that relationships are not exclusively of the “higher the X, the higher the Y” sort. It is also possible to have strong relationships of an inverse nature, as the correlation between age of initial learning (up to 17) and eventual success attests. The different possibilities in strength and direction simply mirror the range of relationships that are possible in the world between two sets of scores. Correlation and Causation © Peter Skehan 2003 18 FDTL DATA Project: Quantitative Research Methods The final major issue that we need to discuss in relation to correlations is that of causation. A fundamental tenet of correlations is that they do not, in themselves, clarify what is causing what. In principle, if two sets of scores (let’s call them A and B) are correlated, the following possibilities exist: • A causes B • B causes A • Something else, e.g. C, causes the relationship between A and B Task 5.6: Hypothesising Causation Here are the correlational figures from the earlier work in this chapter, this time arranged in order positive to negative. In addition, you have an extra column in which you should write what you consider the possible causation explanation. Do this first by indicating whether it is A causes B, or B causes A, or C causes the A-B relationship. Then write a few words to justify this interpretation. Two sets of measures two tests of listening designed to be as comparable as possible a test of speaking and a test of academic reading length of time spent learning a language and proficiency level of motivation and language learning success height and life expectancy for men (or women) level of motivation and language learning aptitude size of big toe and language learning ability level of classroom anxiety and amount of learning level of proficiency and time to complete a multiplechoice language test age someone starts learning a language and eventual proficiency (age range for starting: 2-17) Estimated Correlation 0.90 Hypothesised Explanation …………………………… 0.60 …………………………… 0.40 …………………………… 0.40 …………………………… 0.25 …………………………… 0.00 …………………………… 0.00 …………………………… -0.30 …………………………… -0.40 …………………………… -0.80 …………………………… DON’T TURN THE PAGE UNTIL YOU HAVE FINISHED THE EXERCISE © Peter Skehan 2003 19 FDTL DATA Project: Quantitative Research Methods Feedback on Task 5.6 In each case below, possible causation patterns are proposed. Essentially these are based on: • knowledge of the relevant literature • interpretation based on theory • speculation! Reflect on any differences between your proposals and those which are shown here. Two sets of measures Estimated Correlation Hypothesised Explanation two tests of listening designed to be as comparable as possible 0.90 a test of speaking and a test of academic reading 0.60 length of time spent 0.40 Remembering that testing is a special case for correlation, what is happening here is that an underlying factor (cf. C, above), in the shape of underlying listening ability, is accounting for the similar level of performance on each test. The tests themselves measure this underlying factor in the same way, and so do not interfere with the very high correlation emerging. Verdict: C>A,B As in the previous example, an underlying ability, general language proficiency, functioning as a C type variable, strongly influences performance on each of the tests. Other C type factors which are likely to keep the correlation down below 1.00 are : · differences in underlying abilities for speaking and reading · differences in the format of the two tests which might lead to a format effect and lowered correlations. Verdict: C>A,B This is the first relatively © Peter Skehan 2003 20 FDTL DATA Project: Quantitative Research Methods learning a language and proficiency level of motivation and language learning success © Peter Skehan 2003 0.40 21 straightforward correlation. The most plausible interpretation here is that more time spent learning produces (i.e. causes) more learning. Note though that there must be other factors which also have a causative role since the correlation is well below 1.00 Verdict: A>B This is a particularly interesting correlation. Here are three interpretations: · an A-causes-B scenario. In other words, higher motivation brings about a higher level of language learning proficiency, either because motivated people approach learning with more intensity or because they simply spend more time learning (or, of course, both) · a B-causes-A scenario. In this case, those learners who are successful feel more satisfaction, and as a result are more motivated. In other words, higher motivation is the consequence of greater success, and the positive feelings that this brings about. · a C-causes B and A scenario. Ambitious parents send their children off to school with the message that they should do well. They also provide them with resources to help them achieve. In addition, they indicate to their children that engaging with the schooling system is important, with the result that their children also FDTL DATA Project: Quantitative Research Methods height and life expectancy for men (or women) 0.25 level of motivation and language learning aptitude 0.00 size of big toe and language learning ability 0.00 level of classroom anxiety and amount of learning -0.30 © Peter Skehan 2003 22 have educational aspirations and motivation. Verdict: All causal paths are possible! Note: This is the feedback for Task 5.5. This is a “C” causes A and B situation. In this case, the “C” is likely to be health quality of life at the prenatal and early developmental stage. It has been shown that better health conditions at these phases of life lead to greater life expectancy and, other things being equal, people who are taller. Verdict: C>A,B It might be supposed that greater aptitude would cause greater success which would cause higher levels of motivation. In fact, literature on this issue reports a zero correlation, implying that this chain of reasoning does not apply. Verdict: No causal paths We assume here that these two factors have absolutely nothing to do with one another, with the result that a zero correlation is found, and causation is irrelevant. Verdict: No causal paths Anxiety is theorised to have complex relationships with achievement. A small amount of anxiety is thought to be beneficial. But more than a small amount is thought to be harmful. So we assume here that anxiety (A) causes learners not to attend to what they are trying to learn as effectively as they might, with the result that language achievement (B) suffers. Since it is the higher FDTL DATA Project: Quantitative Research Methods level of proficiency and length of time to complete a multiplechoice language test -0.40 age someone starts learning a language and eventual proficiency (age range for starting: 2-17) -0.80 © Peter Skehan 2003 23 level of anxiety which produces this lowered achieved, it is a negative correlation which results. Verdict: A>B It is assumed here that higher proficiency (A) causes learners who are completing a test to know what they know and know what they don’t know to a greater extent. It is assumed therefore that while doing the test better learners do not need to spend so much time thinking about possible answers since they can access them more readily from their more extensive second language knowledge. Less proficient learners spend more time trying to fill in the gaps in their knowledge, thereby taking longer over each test item. Verdict: A>B It is theorised (a) that humans are prewired for language and (b) that the special capacity specifically for language learning only remains available until a certain threshold age is reached. Changed neurological conditions are generally postulated to underlie this constraint. If this is accepted, then the very strong inverse correlation follows, in that as people reach the end of the critical period, they become progressively less able to learn languages in any sort of special way, instead having to learn them as they would any other sort of cognitive area. They consequently do not reach the same levels of FDTL DATA Project: Quantitative Research Methods achievement since cognitive mechanisms are less powerful than specialist ones. Verdict: A>B, where A is not simply age, but the maturational changes which accompany age. Read Howell, pp231-241, if you want to develop the discussion in this section. Read Miller et al (2002), pp 155-165 for consolidation and preparation for the next section. Continuation of the Main Story Now that you have looked at issues connected with correlation in general, we can return to the dataset we have been working with. Recreate, for Studies One and Two only, the scattergrams (the visual representations of relationships) between: • decision and narrative pausing • decision and narrative accuracy • decision and narrative complexity Note that this simply re-does an earlier Task from the beginning of this chapter, and if you saved the scattergrams from the task in question, you could simply return to this saved output. (Remember that these are based on Studies One and Two only.) The first two of these do show a positive relationship between the sets of scores concerned. The third does not particularly. Task 5.7 Part One Remembering the discussion above of the different levels of correlation in Task 5.4, take a moment and choose from the three possibilities offered in each case what you think the correlations are in these three cases. Remember that correlations range from –1 through 0 to +1. Variables concerned decision and narrative pausing decision and narrative accuracy decision and narrative complexity © Peter Skehan 2003 Estimated relationship a) 0.66 b) 0.56 c) 0.46 a) 0.55 b) 0.48 c) 0.41 a) 0.24 b) 0.18 c) 0.12 24 FDTL DATA Project: Quantitative Research Methods Part Two Now use SPSS to calculate these correlations. You should still have selected cases from Studies One and Two only, using the Data, Select Cases sequence. Then go to the Analyse menu, and choose Correlate and then Bivariate. Then simply choose the two variables that you want, and click O.K. This gives you the first of the correlations. Repeat all of these steps to give you the three correlations that you have been asked to compute. (Note, in passing, that there are some other boxes on the Correlation screen. Simply accept the choices which have been made, such as Pearson’s as the correlation type, and two tailed significance as the type of significance to be used. Pearson’s is the appropriate correlation here, and we will return to the issue of significance below.) © Peter Skehan 2003 25 FDTL DATA Project: Quantitative Research Methods Feedback on Task 5.7 First of all, the output you should have generated should be like the following table, which is for the correlation between decision making and narrative pausing: Correlations Pearson Correlation Sig. (2-tailed) N DPAUSE NPAUSE 1.000 .661** .661** 1.000 . .000 .000 . 71 66 66 67 DPAUSE NPAUSE DPAUSE NPAUSE DPAUSE NPAUSE **. Correlation is significant at the 0.01 level (2-tailed). There are three important rows in this output. The first gives the actual correlation, (twice!) and so this is, far and away, the most important row of the table. As with the matrix of scattergrams, the lower-left section is always a mirror image of the upper-right, so get used to seeing correlations presented twice in this way. The next row focusses on the significance of this correlation. We will return to significance in much greater depth below. For now, note that (a) the correlation of 0.661 has the superscript of two asterisks (**), meaning that this is a significant correlation, and (b) the significance row itself gives the value “.000”, indicating a high level of significance. This, too, is something we will return to. Finally, the third row simply records the number of people who entered into each correlation. This will vary, correlation to correlation, since there may be a few different missing values in each case. To recap here, notice that part of the skill in interpreting output is knowing what to ignore, or not attend to in any detail, and what, correspondingly, should take up most of your attention. In the present case, it is the first line which is most important. Next we turn to the actual correlations which were found. These are as follows: Variables concerned decision and narrative pausing decision and narrative accuracy decision and narrative complexity Correlation +0.66** +0.48** +0.24 Clearly there is a strong degree of agreement for pausing. The relationship © Peter Skehan 2003 26 FDTL DATA Project: Quantitative Research Methods between these two measures suggests that someone who pauses often on one task is likely to pause often on another. There is also a fairly strong relationship between the two accuracy scores, at 0.48. This is a weaker relationship than that for pausing, but it does suggest that there is a tendency for those who are more accurate on one task to be accurate on the other. Note that each of these correlations is statistically significant. Finally, the lowest (and non-significant) correlation, although still positive, is between the two complexity measures. The weaker relationship here implies that complexity is, to some extent, dependent on other things than the task style of the individual. At the moment we do not know what this is, but it seems as if knowing that someone produces complex language on one task doesn’t allow us to predict with confidence that they will produce complex language on another task. Part One of the task you were given, to estimate the correlation on the basis of the scattergram, was difficult, but it is useful to develop an ability to look at a scattergram and make some sort of estimate of the correlation in question. If you got this wrong, don’t worry. It is something that only comes with practice. The more important part of the task, Part Two, was then to actually compute the correlations. That was the more important part, since being able to obtain this precise mathematical measure of relationship is the key skill here. Statistical Significance: A First Glimpse This, the last section of the chapter, is a tough read, which does not really fit into the flow of this chapter. But it does pick up on material we have just covered. It introduces a topic which will be fundamental in the next chapter - the notion of statistical significance. If you don’t want to engage with these ideas here, you can leave out this section on first reading, (i.e. finishing the chapter at this point), but plan on returning to it later, perhaps after you have completed Chapter Six. As we have just seen, there is more material in the correlation output than simply the correlations themselves, including the significance level for each correlation and the number of cases. This latter is fairly self-evident - it reflects the number of subjects whose data underlies the correlation itself. But the significance value is a little more complex. We will, in fact, be looking at significance in much greater detail in Chapter Six. For now, a minimal explanation should suffice, and capitalises upon the output data we have just explored. Step back for a moment from the correlational values that you have found. They are specific correlations based on one particular set of data. But it is possible to imagine many other correlations computed between the same two measures (the two accuracy scores, say) with other groups of subjects and even with the same subjects using the same tasks, but on different occasions. If we did such a “thought” experiment, © Peter Skehan 2003 27 FDTL DATA Project: Quantitative Research Methods imagining that the same subjects were required to redo the task and be measured in the same way, we would obviously get lots of correlations, once each for each “experiment” in our thought “experiment”. Now these correlations would not all be the same. Some would clearly be higher than others, and some lower. There would be all sorts of chance factors which influenced people’s actual scores on the two measures, e.g. fatigue, inattention. To turn this around a little, when we do an actual study we only have one correlation, but we have to think of this correlation as being part of a range of possible correlations from a range of possible studies, with some of this range of hypothetical correlations being higher than others. Now we can complexify things for one final step. Imagine that we somehow know that there is no relationship between two variables. (Don’t ask how: just imagine that we do!) Let’s imagine that nonetheless we collected data on each of the variables, and correlated this data. We would get a correlation coefficient, and it is highly likely that while this correlation would be close to zero, it wouldn’t be exactly zero. Let’s imagine that we gathered another set of actual measurements, and that we correlated these. The result this time might well be slightly different from zero and different from the first correlation that we obtained. Now imagine that we collected lots of samples of data and so computed lots of correlation coefficient. The important thing here is that these correlations would vary in size (and direction). It is possible that through chance factors alone, some of these correlations would be, while not far from zero, not exactly in the range 0.00 to 0.06 (say). In other words, it would be possible that we would find particular correlations which might look interesting, (e.g. 0.20), but which in fact would be solely due to chance factors in the sampling we had done. Human behaviour is variable, and we couldn’t guard against this. Now the key point is that when, in the real world, we compute only one correlation, we have to distinguish between the situation where this may look interesting but only in reality be the result of chance factors, and the situation when the correlation does reflect a genuine sort of relationship. This brings us to a key point. In the SPSS printout, what we are given information about is the likelihood that a particular correlation could have arisen purely by chance, assuming no real relationship between the two measures concerned. In this way, SPSS is telling us that while a correlation may look interesting, we have to avoid taking it too seriously if it is within the range that could have arisen by chance. So, when examining a table of correlations, the first hurdle we have to overcome is to have some confidence that a particular reported correlation is not there simply by chance. We need to establish whether there is a real relationship to be taken seriously. But how do we decide? The output table from SPSS, in its second “block” or “row” (as shown in the beginning of the feedback section to Task 5.7) shows the likelihood that any correlation could have arisen by chance. Here are the three correlations that were computed as part of Task 5.7, together with the probability that they could have occured by chance (with this taken from the SPSS output): Variables concerned © Peter Skehan 2003 Correlation 28 Probability of FDTL DATA Project: Quantitative Research Methods decision and narrative pausing decision and narrative accuracy decision and narrative complexity +0.66 +0.48 +0.24 Occurrence by Chance .000 .000 .058 The first two correlations (for pausing and for accuracy) are higher correlations and give figures (slightly misleadingly because of a quirk of SPSS printout conventions) of .000. Accept, for now, that it would have been more helpful, and equally meaningful, if they had given the figure .001. This would have meant that a correlation as large as the one that was found (e.g. 0.66) would only have arisen purely by chance (assuming no underlying relationship) 1 time in 1000 (i.e. put a “1” in front of the three zeros to make “1000”). If the figure had been .01, then the probability of chance occurrence would have been 1 in 100. Somewhat perversely, SPSS (a) treats three significant figures as being the “extreme” of its probability (well, improbability!) scale, and (b) registers that the two correlations we are so far looking at are even more improbable than this. Accordingly, it reports a figure of .000, meaning that the degree of improbability of finding this correlation by chance is even more remote than the limit that its scale allows. Or, to put it another way, very, very, very improbable! The remaining correlation, between the two complexity figures, is interesting and tantalising in equal measure. Let’s step back a moment, before making judgement about it. (We will return to this judgement making on the next page.) SPSS is, in effect, using a scale of probability-improbability, and expressing the likelihood of any particular correlation in terms of this scale. The scale runs from complete probability, at one end, to infinitely improbable, at the other. The table below simply gives a number of steps along this scale. ProbabilityImprobability .50 .20 .10 .05 .02 .01 .005 .002 .001 .0001 Alternative Expression One in two One in five One in ten One in twenty One in fifty One in a hundred One in two hundred One in five hundred One in a thousand One in ten thousand You can imagine other steps along this scale, such as .75 (three in four), or .025 (one in forty). The major point here is that there is no pre-ordained place in this sequence which separates probable from improbable. Instead what we have really is a continuum of improbability. Think, for a moment, about the .05, or one-in-twenty point. Imagine that we get a correlation that SPSS calculates would occur by chance only one time in twenty. Should we take this seriously? Should we say: if this could only occur by © Peter Skehan 2003 29 FDTL DATA Project: Quantitative Research Methods chance one time in twenty, it is unlikely that chance would account for this finding, and so we should conclude there is a real relationship? Or should we say that one chance in twenty is not that far-fetched, and that it would be better to conclude that chance couldn’t be ruled out? Let’s take the extremes (since these are fairly easy). If we get a correlation which could only occur by chance one time in ten thousand, we may fairly conclude that it is highly unlikely that such a freak occurrence accounts for the result we might so obtain. Instead, it seems straightforward to conclude that the result reflects a real relationship, and we can designate the correlation as statistically significant. In contrast, if we get a correlation which SPSS tells us could occur by chance one time in two, we can conclude that chance could easily come into play in this case, and so we cannot rule out its effects. In other words, we wouldn’t dare take this correlation as statistically significant, and would not want to rule out a chance explanation. Task 5.8 Choosing the points in the above probability-improbability scale which are important is not obvious. The basic task is to pick points which allow effective guesses that a result didn’t occur by chance. If you choose one in two (wrongly!) this would imply you are too generous in accepting a significant thing has happened when chance alone could explain everything. Of course, you could go to the other extreme (wrongly again!) and choose the “one in ten thousand” criterion. But then a lot of the time correlations which meant a real relationship would be rejected as not being improbable enough. The task is to choice the level of chance which works best most of the time. In that respect, you get three attempts! Pick three points in the above scale which you think are particularly useful in separating probable from improbable results. Bear in mind that the basis you have for making this judgement is very slim. But you will see that there is a consensus amongst social scientists about the three particularly important points, and it is useful, for an exercise only, to see if you can second guess such people! DO NOT READ ON UNTIL YOU HAVE DONE THE EXERCISE!!!!! © Peter Skehan 2003 30 FDTL DATA Project: Quantitative Research Methods Feedback on Task 5.8 By convention, the social sciences generally take three points as having special significance. These are .05 (one in twenty), .01 (one in a hundred), and .001 (one in a thousand). The choice of these particular points may be arbitrary, (why not one in twenty five, for example?) but once chosen these points become the standard by which we judge our results. Note again that underlying this approach is the assumption that we cannot simply look at a set of results and say definitely “That result is important”. What we have to do is couch our assessment in terms of probability, and then we have to adhere to the decision points in this improbability scale that have emerged through the consensus of social scientists. We will return to this in Chapter Six, where we will explore the connection between this method of thinking statistically, and a Popperian approach to falsification in science. Now we can return to the information we were looking at earlier. We can now see that the correlations between decision pausing and narrative pausing (0.66) and between decision accuracy and narrative accuracy (0.48) comfortably meet our .001 criterion. We can conclude, therefore, (using the “official” statistical terminology) that these correlations are significant, i.e. that we accept that the degree of improbability is so great that they didn’t occur by chance. (The concept of significance is one we return to in the next chapter, since it is fundamental to all serious statistics.) The correlation between decision complexity and narrative complexity (0.235) is more problematic. The associated (im)probability is .058. Now with both of these figures it is appropriate to round to two decimal figures, giving 0.24 and .06. (Measures in the social sciences, in general, are not so precise that they warrant more than two significant figures. Mostly, if you see someone quoting three significant figures (or more), it is a strong indicator that they don’t know what they are doing!) Earlier we said that the most lenient level of improbability that we will accept that allows the claim of a significant result is .05 (one in twenty). Quite clearly, .06 is larger (i.e. more probable) than this, at something like one in sixteen or one in seventeen, so we have missed the critical value. Only by a hairsbreadth, but we have missed nonetheless. The consequence is that we cannot claim that this correlation is significant. (You have to say to yourself here that some sort of decision-making standard is necessary, and that if such a standard is to work, we have to adhere to it. Even when, as here, the results are very close.) It follows that we have to accept that this correlation could have arisen by chance. What this means is that while for pausing and for accuracy we are entitled to start discussing and interpreting the relationships involved, we are not entitled to do this for the complexity correlation. Hard lines, maybe, but it is the only course of action available! © Peter Skehan 2003 31 FDTL DATA Project: Quantitative Research Methods One final note on correlations and significance. As with most statistical procedures, underlying the decision-making about significance is a formula which allows us to calculate the size of correlation that would be significant for different sample sizes. We are not going to look at this formula here, but it is relevant to say that the formula is dependent on two things: • the strength of the relationship involved • the number of cases the correlation is based on Some values of the minimum size of correlation needed to attain the 0.05 significance level for different sample sizes are as follows: Sample size Significant correlation 0.65 0.45 0.36 0.31 0.28 0.20 0.14 0.09 10 20 30 40 50 100 200 500 The higher the correlation (fairly obviously) the more likely it is to be significant. But it is also the case that the more cases the correlation is based on, the more likely it is to be significant, as the table above makes abundantly clear. So if we want to talk about “maybes”, we can say that the correlation for decision-making and narrative complexity would have been significant if, for the number of cases involved here, it had been any higher than the figure of 0.24 (i.e. 0.25 would have done!). But it would also have reached significance if it had been based on just two more cases, i.e. 72 instead of 70, i.e. 0.24 would have reached significance with this larger number of cases. Obvious moral: try to gather data on as many cases as possible!! Notice also that as the sample size becomes seriously large, attaining a significant correlation is much easier, and so to claim significance with a sample size of (say) 500 is less impressive. Particularly clearly in this case, it is the strength of the correlation that counts, rather than simply having attained significance. (Another issue we will return to in Chapter Six.) © Peter Skehan 2003 32 FDTL DATA Project: Quantitative Research Methods Tying it all together: Generating and Interpreting Correlation Coefficients The remainder of this chapter is concerned with a number of tasks which are meant to integrate what you have covered about correlation coefficients. These include: • producing (more) correlation coefficients • relating correlations to the notion of statistical significance • generating blocks of correlations more easily Task 5.9 Compute the correlations between the scores for decision-making and narrative accuracy, on the one hand, and the scores for decision-making and narrative complexity, on the other. Interpret the results, drawing on what you know about statistical significance, as appropriate. You can use either of the following methods: Method One: Using Analyse, Correlation, Bivariate as your sequence of actions, choose and move to the right-hand box the Decision-making complexity score and then the Decision-making accuracy score. Record the result, and then repeat with the corresponding actions to obtain the three other correlations you have been asked to obtain. Finally, with your table of four correlations, make your interpretations. Method Two: As above, but move all four measures across to the right hand box at the same time. Then, from the larger output table that you produce, identify the correlations you have been asked to obtain, and ignore the result. Interpret the specific correlations involved in the task only. Feedback on Task 5.9 © Peter Skehan 2003 33 FDTL DATA Project: Quantitative Research Methods The correlations are as follows: Decision Accuracy Decision Complexity Narrative Complexity .08 .10 Narrative Accuracy .26* .02 Although one of these correlations reaches the one-in-twenty (.05) level of significance, it only just reaches this level, and the other three correlations do not reach significance at all. We can conclude, therefore, that performance on these tasks in terms of accuracy is largely independent of (i.e. does not correlate with) the complexity dimension of performance. To put this another way, it appears that knowing that someone produces accurate language does not enable us to predict that they will use more complex language - they might but they might not. Reflection Although the advantages and disadvantages may have been apparent in earlier tasks, it is worth giving a little attention to the two methods you could have used to obtain the results. The first is more laborious but more selective, and only generates the specific correlations you want. The cost is that you have to use four “runs” to get these results. The alternative method, choosing all measures that are involved, generates more correlations than you need, but it has the virtue that only one “run” is needed. The disadvantage is that you need to know how to “read” the larger table of correlations that is produced, and to “screen out” those that don’t apply to the question you are pursuing. In practice, of course, for reasons of less effort, most people lean towards the second method, and the need to learn how to ignore effectively! As you have already seen, SPSS errs on the side of easily giving lots of output. Not all of it needs to be focussed on in great detail, and the skill to work with output quickly by focussing on what is important s a good one to learn. Task 5.10 Do the same for the relationship between the various accuracy and fluency scores. Then do the same for the relationship between the different complexity and fluency and scores. It would be worth trying here to generate the large correlation matrix, and then extract for attention those correlations which are specifically required by the above instruction. You might then try to assemble the specific correlations in two separate Word tables, one for the accuracy-fluency connection, and one for the complexityfluency connection. © Peter Skehan 2003 34 FDTL DATA Project: Quantitative Research Methods Feedback on Task 5.10 You should obtain the following results: Decision Fluency Narrative Fluency Decision Accuracy -0.31* -0.22 Narrative Accuracy -0.38** -0.36** Decision Fluency Narrative Fluency Decision Complexity -0.54** -0.39** Narrative Complexity -0.31* -0.23 These figures suggest that there is a negative relationship between complexity and accuracy, on the one hand, and the number of pauses, on the other. In other words, the higher the accuracy (or complexity), the fewer the pauses. In fact, although the relationship between the measures is an inverse one, it is easier to think of the relationship between the “constructs” as positive. Fewer pauses in fact means higher fluency, i.e. absence of disfluency. So the relationship is essentially a positive one - the more complex and accurate the language, the more fluent the performance. Here, at least, it looks as if those who can achieve either accuracy of complexity (remember that these are independent of one another) tend to be more fluent. In other words, they do not achieve their accuracy or complexity at the expense of fluency. Reflection There is one last bit of reflection with correlations. The last task has shown how (a) it is easiest to generate whole swathes of correlations at one go, and (b) it makes more sense to work one’s way through such blocks of correlations in a systematic manner. SPSS is just as happy to correlate large numbers of correlations as it is to do just one. Obviously this is often very convenient. But don’t be too seduced by this - there is no substitute for carefully and selectively working your way through a whole matrix of correlations where you work is motivated by principle, not simply trawling through the data for something interesting. By all means, in other words, exploit SPSS’s power. But be sensible as well, and try to follow predictions and principle, rather than aimless explorations. © Peter Skehan 2003 35
© Copyright 2026 Paperzz