CHAPTER 4: 36 4. DIRECTION OF CAUSATION In chapter 3, we learned how to regress the output of wheat against the amount of fertilizer per acre. Does it make any difference whether we regress wheat against fertilizer or fertilizer against wheat? And if so, how do we decide which variable is on the left-hand side of the equation and which is on the right-hand side of the equation? And if we can’t determine which variable belongs on which side, what do we do then? A. INDEPENDENT vs. DEPENDENT VARIABLES In ordinary mathematics the following equations are identical: 1. Y = A + BX 2. Y −A =X B 3. X = Y −A Y A A Y = − =− + B B B B B A 1 4. X = A´ + B´Y where A' = − ; B' = B B However this is not generally true in regression as the least squares technique will give different answers depending on whether we have Yˆ = A + BX or Xˆ = A´ + B´Y. When we regress Y against X, we minimize ∑ (Yi − Ŷi )2 ; that is, we minimize the sum of the squared vertical distances between the line and the observation. However, when we regress X against Y, we minimize ∑ (X i − X̂ i )2 , which is equivalent to minimizing the sum of the squared horizontal distances between the line and the observations. All of this can be understood in terms of the least-squares formulae. To hammer the ideas home, let us consider the following example: CHAPTER 4: Xi (Xi – X ) (Yi – Y ) Yi 1 2 3 ∑Xi = 6 B= 37 (1 – 2) (6 − 4) = −2 (2 – 2) (2 − 4) = 0 (3 – 2) (4 − 4) = 0 ∑(Xi – X )(Yi – Y ) = −2 6 2 4 12 = ∑Yi (Xi – X )2 (1 - 2)2 = 1 (1 - 2)2 = 0 (6 - 2)2 = 1 ∑(Xi – X )2 = 2 (Yi – Y )2 (6 − 4)2 = 4 (2 − 4)2 = 4 (4 − 4)2 = 0 ∑(Yi – Y )2 = 8 ∑ (X – X ) (Y – Y ) = −2 = −1 2 ∑(X – X ) i i 2 i A = Y – BX = 4 − (−1)(2) = 6 Yˆ = 6 – X This regression can be seen in Figure 4:1A. If we regress X against Y we substitute X for Y and Y for X in the least squares formula as X and Y "change places." Xˆ = A´ + B´Y B´ = ∑ (X – X ) (Y – Y ) = −2 = − 1 8 4 ∑ (Y – Y ) i i 2 i ⎛ 1⎞ A´ = X – B´ Y = 2 − ⎜ − ⎟ 4 = 3 ⎝ 4⎠ 1 X̂ = 3 − Y or Y = 12 − 4 Xˆ . This can be seen in Figure 4.1B. 4 CHAPTER 4: 38 Figure 4:1 A: Yˆ = 6 − X 10 11 12 9 8 7 Y 6 5 4 3 2 1 0 0 1 2 3 4 5 Y 6 7 8 9 10 11 12 B: Y = 12 − 4 X̂ 0 1 2 3 X 4 5 6 0 1 2 3 X 4 5 6 Hence, when we regress Y against X we get a different line than when we regress X against Y. When Y is regressed against X, the least squares line minimizes the sum of squared errors between Yi and Yˆ . In contrast when X is regressed against Y, the least squares line minimizes the sum of squared errors between X and Xˆ Except when all the observations fall on the same line, it does make a difference whether we regress Y against X or X against Y. Therefore how does one choose the correct regression? We regress Y on X when X is the independent (exogenous) variable and Y the dependent (endogenous) variable (and X against Y when the reverse is the case); typically in econometrics X, stands for the independent variable, and Y the dependent one, unless otherwise specified. The independent variable is plotted on the horizontal axis and the dependent variable on the vertical. It is not always easy to know which is the independent and which is the dependent variable. The determination often involves considerable knowledge on the part of the researcher and an understanding of certain philosophical issues. A good rule of thumb is the following: If the researcher has a prior belief that one variable depends on the other but not vice versa then the former is the dependent variable and the latter is the independent variable. The determination of which variable is the independent variable and which variable is the dependent variable is based CHAPTER 4: 39 on the theoretical relationship to be analyzed. If the theory suggests a causal relationship, then the regression model treats the dependent variable as being caused by the independent variable(s). For example, our rudimentary knowledge of meteorology would suggest that the quantity of wheat grown is the dependent variable, and the amount of sunshine is the independent variable as sunshine causes wheat to grow and not wheat growth causes the sun to shine. In the example presented above, the researcher determined the amount of fertilizer to apply to each plot and then observed the bushels of wheat per acre. So the amount of fertilizer applied determined the bushels per acre. Fertilizer per acre is the independent variable and bushels per acre is the dependent variable. The bushels of wheat per acre did not determine the amount of fertilizer that the researcher applied. Here is one more example to drive the point home, income is the independent variable and price of car purchased is the dependent variable. Making more money provides the wherewithal to buy an expensive car; buying an expensive car will not make you rich (if anything, it will make you poorer). A more nuanced understanding of regression is gained by considering situations when there are several variables. Going back to the wheat example, the number of bushels of wheat produced per acre may depend on both the amount of rainfall that falls on a plot of land and the amount of fertilizer that is applied. Assuming that the fertilizer choice is not affected by the amount of rainfall, both rainfall and fertilizer are the independent (right-side) variables because each is independent of the other, while bushels per acre is the dependent (left-side) variable because it is not independent of the other two variables. Now it is true that the amount of fertilizer is likely to depend on other variables (for example, whether the fertilizer spreader is accurate), but as long as these other variables affect wheat production only through the amount of fertilizer, fertilizer can be treated as the independent variable in finding the relationship between fertilizer and wheat production. In other cases, both variables are a priori felt to be dependent on each other or dependent on a third variable. As an example, except for Rastifarians, it appears that there is a negative or inverse relationship between going to church and smoking marijuana. It is not clear whether going to church discourages one from smoking marijuana, smoking marijuana discourages one from going to church, or whether both depend on a third variable such as parental attitudes. In CHAPTER 4: 40 this case we would use correlation (to be explained below) instead of regression to find the relationship between church going and smoking marijuana (or, when appropriate, use simultaneous equation techniques to disentangle the interrelationships—see Chapter _). As another example, the number of cell phones and the number of plasma screen televisions have both increased over the last 30 months. But neither has caused the other. Instead both are due to other factors (technological change, increase in income, etc.). It should be noted that I have stressed the notion of a priori belief concerning dependent and independent variables. The data cannot tell us which is the dependent variable or the direction of causation. This is clear from the example of first regressing Y against X and then regressing X against Y. In either case we get a regression line. Therefore (in the absence of some subsidiary assumptions such as causation does not work backward in time), how could the present data tell us the direction of causation? B. THE ARROW OF CAUSATION Once we view the right hand variable as having an effect on the left hand variable, we need to be very careful that this is indeed the case. The mistake is much more serious than just getting an incorrect coefficient. It means that the person is making a mistake about the arrow of causation. If one regresses the amount of sunshine on the amount of wheat produced, it is incorrect to assert that growing more wheat will increase the amount of sunshine. Even if they have not taken a course in statistics, most people have a basic understanding regarding cause and effect. Nevertheless, researchers sometimes are confused on the direction of causation. For example, consider a recent study by Stanley Kurz, a social anthropologist (http://judiciary.house.gov/legacy/kurtz042204.htm) that was reported in a number of newspapers. Kurz argued that allowing homosexuals to marry increased the number of out of wedlock births by heterosexuals. The author had looked at data from the Netherlands. He observed that out of wedlock births increased dramatically starting in 1996 when a law allowing marriage between homosexuals was introduced and debated and continued to increase after 2000 when the law was fully implemented. He therefore concluded that the law caused heterosexuals CHAPTER 4: 41 to have children outside of marriage. There are two problems with his analysis. First, he is basing his conclusion on a limited sample (one country and one change in the law over a small number of years). More relevant to the discussion in this section, the obvious direction of causation is that changes in the law and increases in out of wedlock birth are both due to changes in attitudes about marriage, not that allowing homosexual marriages make heterosexuals want to have children outside of marriage. Even if you don’t undertake any econometric research, you should always be prepared to question newspaper reports of research findings. Let us consider one final example. To sharpen your wits, try to provide a counterexplanation before you read my counter. Does watching “The Daily Show” with John Stewart make the viewer more cynical? Suppose that the following was provided as evidence: People who watch this cynical show are more cynical (in responding to a questionnaire) than those who don’t watch the show. The problem with this simple test is that cynical people are more likely to watch the show in the first place. That is, the arrow of causation may be in the opposite direction. Actually, a researcher did ask this question (Diana Mutz, 2004, “Comedy or News? Viewer Processing of Political News”, Paper presented at the 3rd Annual Pre-APSA conference on Political Communication, September 1, 2004. Chicago), but she got around the problem of two-way causation by first dividing students into two groups: those who had watched the “Daily Show” in the past and those who had not. Then the psychologist further divided each of these groups into two – making one group watch “The Daily Show” and making the other group watch a regular news show. In this way, watching the TV show during the experiment was the independent variable as the psychologist chose which subject would watch which TV show, while cynicism was the dependent variable. She then asked all of the students to fill out a questionnaire. Those asked to watch the Daily show were more cynical. In the “Daily Show” example, the experimenter caused the students to watch the show or a regular news format; the student’s cynicism did not make the researcher choose which show the student would watch In a nutshell, make sure that you know the direction of causation. That is, make sure that the variable on left hand side of the equation is the dependent variable and the variable on the right hand side is the independent variable and not the reverse. Switching not only gives different CHAPTER 4: 42 coefficients, but more important, may lead to incorrect conclusions about which variable is the cause and which variable is the effect. B. SAMPLE CORRELATION If we do not know which is the independent and which is the dependent variable, or if we believe that they are jointly dependent on each other or on a third variable, then we use the sample correlation coefficient denoted by the symbol R. N ∑ (Yi – Y ) (Xi – X ) 4:1 DEFINITION: R= i =1 N N i=1 i =1 ∑ (X i – X)2 ∑ (Yi – Y)2 is the sample correlation coefficient R is the geometric average of the slopes found in regressing Y on X ( Yˆ = A + BX) and X on Y ( Xˆ = A´ + B´Y) N 4:1 BB' = ∑ (Yi − Y )(X i − X ) i=1 N ∑ (X i −X ) 2 N ⋅ ∑ (Yi − Y )(X i − X ) i=1 i=1 N ∑ (Yi −Y ) 2 i=1 N ∑ (Yi − Y ) (X i − X ) = i=1 N 2 N ∑ (X i – X ) ∑ (Yi – Y ) i=1 = R 2 i=1 Thus the formula for R reflects the fact that we do not know which variable is the independent one and therefore we treat the variables symmetrically and choose the (geometric) average slope. CHAPTER 4: 43 Sample correlation is a measure of linear relationship between the two variables in the sample. If there is no linear relationship, then the sample correlation coefficient (R) is equal to 0. If the linear relationship in the sample is perfect, then R is equal to 1 or –1. The intuitive understanding can best be understood by again looking at the formula and then looking at some examples. N ∑ ( Yi – Y) ( Xi – X) 4:2 i=1 R= N ∑ (X i – X ) i=1 = N 2 N ∑ (Y i – Y ) i=1 N 2 SAMPLE COV (X,Y) SAMPLE SAMPLE VARIANCE X VARIANCE Y N The sample correlation, R, is the sample covariance or joint linear variability standardized (i.e., divided) by the square root of the individual variances. Ignoring the minus sign, R tells us what percent of the total variability of X and Y is a linear covariability between X and Y. If the sample covariability between X and Y is zero (i.e., sample Cov(X, Y) = 0), then knowing that X is greater than X tells us nothing as to whether Y is greater or lesser than Y . Thus the sample correlation is zero. What if there were a perfect linear relationship between X and Y (i.e., all of the observations fell on a line)? Then each Yi = Yˆ i = A + BXi and R would then equal the following: R = ∑ (A + BX i − A − BX )( X i – X ) [∑ (X −X ) ∑ (A + BX 2 I i − A − BX ) 1 2 2 ] = B∑ ( X i – X )( X i – X ) 1 ⎡ ( – )2 2 ( – )2 ⎤ 2 ⎢⎣∑ X i X B ∑ X i X ⎥⎦ Of course if Yi = Xi a glance at the formula for R will show that R = 1. =1 CHAPTER 4: 44 EXAMPLE 1: Xi 1 2 3 ∑Xi=6 X=2 Yi –3 –5 –7 –15 = ∑Yi –5 = Y (Xi – X ) (Yi – Y ) (Xi– X )2 (Yi– Y )2 (1 – 2) = –1 (–3 –(–5)) = 2 1 4 –2 (2 – 2) = 0 (–5 –(–5)) = 0 0 0 0 (3 – 2) = 1 –7 –(–5)) = –2 1 4 –2 ∑(Xi– X )2 = 2 ∑(Yi– Y )2= 8 ( Xi– X ) (Yi – Y ) ∑(Xi– X )(Yi – Y ) = –4 N ∑ ( Xi – X) ( Yi – Y) R= i=1 N = 2 ∑ (X i – X) ∑ ( Yi – Y) 2 –4 –4 –4 = = 4 = –1 16 2 ⋅8 i=1 This can be seen in Figure 4:2A, where all the observations lie on a straight line with negative slope and therefore R = –1. CHAPTER 4: 45 FIGURE 4:2 A B X X 2 3 0 2 3 -1 -6 -7 -5 -4 Y -3 -2 -1 -2 -3 -6 -7 -5 -4 Y 1 0 1 0 0 In both A and B the correlation is perfect. R = –1 as the observations all lie on a line of negative slope. If Xi = –Yi, (equivalently, Yi = –Xi), then R would again equal –1 (see figure 4:2B). Therefore R cannot be used to predict Y given X or vice versa. It just tells whether the sample relationship is positive or negative and how close the relationship approximates a linear one, but not the slope. R does not tell us as much as the regression equation Y = A + BX. This should be expected. In regression, we bring in more information (which variable is the independent variable), and therefore we can get more out of it. This is an example of an important rule in econometrics: the more prior information that we bring to bear, the more precise our results are likely to be. CHAPTER 4: 46 EXAMPLE 2: Yi (Xi – X ) 6 (–1 – 0) = -1 0 (0 – 0) = 0 6 (1 – 0) = 1 12 = ∑Yi Xi -1 0 1 ∑Xi=0 (Yi – Y ) (Xi– X )2 (6 – 4) = 2 1 (0 –4) = -4 0 (6 – 4) = 2 1 ∑(Xi– X )2 = 2 (Yi– Y )2 4 16 4 24 =∑(Yi– Y )2 (Xi– X ) (Yi– Y ) –2 0 2 ∑(Xi– X )(Yi – Y ) = 0 R= ∑ (Xi – X) (Yi – Y) N ∑ (X i – X) N 2 = 0/48 = 0 ∑ (Y i – Y) i=1 2 i=1 Notice that the sample correlation, R, and the sample covariance = 0 even though Y = 6X2. While there is a perfect nonlinear relationship between X and Y, there is no linear relationship between X and Y. This example emphasizes the fact that regression and correlation are concerned with linear relationships. The fact that R = 0 does not mean that there is no relationship between X and Y, only that there is no linear relationship. As already stated, the sample correlation between X and Y N ∑ (Yi − Y ) (X i − X ) i=1 =R= = N N ∑ (X i −X ) i=1 N 2 N ∑ (Yi − Y ) i=1 N 2 SAMPLE COV (X,Y) . SAMPLE SAMPLE VARIANCE X VARIANCE Y CHAPTER 4: 47 It is therefore not surprising that R has many of the characteristics of the sample covariance. In particular, if the sample covariance between X and Y is positive (negative), so is the sample correlation between X and Y. If for example, when X is above the average X, Y tends to be below the average Y, then both the sample covariance and the sample correlation are negative. Insight into the characteristics of R can be obtained by reviewing Chapter 2 section D on covariance. An important difference between correlation and covariance is that correlation is independent of scale. Suppose that we are finding the correlation between the husband's height and the wife's height. If men of above average height marry women of above average height and men of below average height marry women of below average height, then the sample correlation is positive. If we first measure height in feet and then later in inches, the measure of correlation does not change. Looking at the numerator in our definition of R, each deviation from Y is multiplied by 12, and each deviation from X is also multiplied by 12. So the numerator is 144 times as large when the measurement changes from feet to inches. Looking at the denominator, each deviation from Y is again multiplied by 12. These deviations are then squared so the squared deviations of Y are 144 times as large when the measurements are in inches instead of feet. Similarly, the squared deviations from X are 144 times as large. So in the denominator we have the square root of (144 × 144) = 144. Since both the numerator and denominator are multiplied by 144, there is no change in the correlation value when the units of measurement change. The same is not true for sample covariance since it is multiplied by 144. C. CONCLUDING REMARKS It is important to know the arrow of causation – that is, it is important to know which variable is the independent variable and which is the dependent variable or whether they both depend on each other or a third variable. If one is mistaken in the direction of causation, not only will one get a different least squares line, but also one will make mistaken inferences. If gay men wear their hair shorter, one should not infer that having your hair cut short will make you gay. This is obvious to everyone, but other cases are not so obvious, particularly when the arrow of CHAPTER 4: 48 causation may be in both directions. In such cases, people who believe that the causation is in one direction may marshal evidence that is perfectly consistent with the direction of causation being in the opposite direction. Let us illustrate, by considering the following example. If we observe that children who watch more television tend to be more violent, does that mean watching television makes children more violent or does it mean that those who are more violent have a preference for watching television when they are not engaging in violence. Both are plausible hypotheses. So one cannot come to a definitive conclusion regarding direction of causation by finding a positive correlation between the two. Sometimes, there are natural experiments, where nature inadvertently creates an experiment so that the direction of causation is only in one direction. To illustrate, we will look at work by Matthew Gentzkow and Jesse Shapiro (Preschool Television Viewing and Adolescent Test Scores: Historical Evidence from the Coleman Study" Quarterly Journal of Economics, 2008) as reported by Austen Goolsbee (http://www.slate.com/articles/business/the_dismal_science/2006/02/the_benefits_of_bozo.html) According to most experts, TV for kids is basically a no-no. The American Academy of Pediatrics recommends no TV at all for children under the age of 2, and for older children, one to two hours a day of educational programming at most. Various studies have linked greater amounts of television viewing to all sorts of problems, among them attention deficit disorder, violent behavior, obesity, and poor performance in school and on standardized tests. Given that kids watch an average of around four hours of TV a day, the risks would seem to be awfully high. Most studies of the impact of television, however, are seriously flawed. They compare kids who watch TV and kids who don't, when kids in those two groups live in very different environments. Kids who watch no TV, or only a small amount of educational programming, as a group are from much wealthier families than those who watch hours and hours. Because of their income advantage, the less-TV kids have all sorts of things going for them that have nothing to do with the impact of television. The problem with comparing them to kids who watch a lot of TV is like the problem with a study that compared, say, kids who ride to school in a Mercedes with kids who ride the bus. The data would no doubt show that Mercedes kids are more likely to score high on their SATs, go to college, and go on to high-paying jobs. None of that has anything to do with the car, but the comparison would make it look as if it did. The only way to really know the long-term effect of TV on kids would be to run an experiment over time. But no one is going to barrage kids with TV for five years and then see if their test scores go down (though I know plenty of kids who would volunteer). CHAPTER 4: 49 Gentzkow and Shapiro came up with a different way to test the long-run impact of television on kids—by going back in time. When Americans first started getting television in the 1940s, the availability of the medium spread across the country unevenly. Some cities, like New York, had television by 1940. Others, like Denver and Honolulu, didn't get their first broadcasts until the early 1950s. Whenever television appeared, kids became immediate junkies: The key point for Gentzkow and Shapiro's study is that depending on where you lived and when you were born, the total amount of TV you watched in your childhood could differ vastly. A kid born in 1947 who grew up in Denver, where the first TV station didn't get under way until 1952, would probably not have watched much TV at all until the age of 5. But a kid born the same year in Seattle, where TV began broadcasting in 1948, could watch from the age of 1. If TV-watching during the early years damages kids' brains, then the test scores of Denver high-school seniors in 1965 (the kids born in 1947) should be better than those of 1965 high-school seniors in Seattle. What if you're concerned about differences between the populations of the two cities that could affect the results? Then you compare test scores within the same city for kids born at different times. Denver kids who were in sixth grade in 1965 would have spent their whole lives with television; their 12th-grade counterparts wouldn't have. If TV matters, the test scores of these two groups should differ, too. Think analogously about lead poisoning. Lead has been scientifically proven to damage kids' brains. If, hypothetically, Seattle added lead to its water in 1948 and Denver did so in 1952, you would see a difference in the test-score data when the kids got to high school—the Seattle kids would score lower than the Denver kids, and the younger Denver kids would score lower than the older Denver ones, because they would have started ingesting lead at a younger age. Gentzkow and Shapiro got 1965 test-score data for almost 300,000 kids. They looked for evidence that greater exposure to television lowered test scores. They found none. After controlling for socioeconomic status, there were no significant test-score differences between kids who lived in cities that got TV earlier as opposed to later, or between kids of pre- and postTV-age cohorts. Nor did the kids differ significantly in the amount of homework they did, dropout rates, or the wages they eventually made. If anything, the data revealed a small positive uptick in test scores for kids who got to watch more television when they were young. For kids living in households in which English was a second language, or with a mother who had less than a high-school education, the study found that TV had a more sizable positive impact on test scores in reading and general knowledge. CHAPTER 4: 50 The most innovative part of the above study is that it is able to determine the arrow of causation. Clearly, the availability of television broadcasting is the independent variable and watching TV is the dependent variable. This gets around the problem in other studies where the arrow of causation is likely to go in both directions: watching more TV may result in lower academic abilities; lower academic abilities may encourage a person to watch TV rather than read a book or study. The study is also important because it controls for other variables (such as level of parent’s education), a point we will come to in a later chapter. It is also important because other studies that are based on experiments can only determine short-term effects. If kids watch a violent TV program, they may act more aggressively for the next hour or two than if they had watch a ballet. But we do not know how this impacts their lives over the long-run. PROBLEMS 1. ˆ = A + BX not in general equal to 1/B´, in the Why is the B in the linear regression Y ˆ = A´ + B´Y. (4) Under what circumstances are they identical? (2) linear regression X 2. Why can't we use empirical data to tell us which variable is the independent variable and which is the dependent variable? (2) 3. What is the difference between correlation and linear regression? (4) When is each appropriate? (4) 4. Prove that a perfect linear relationship results in an R2 of 1. (6) 5. In the equation Y = A + BX, which is the independent variable and which is the dependent variable? (2) What are the parameters and what are the variables? (4) 6. Explain in your own words, the problem of disentangling the effect of TV watching on school performance. (6) Explain in your own words how the Gentzkow and Shapiro study resolved the problem of two-way causation. (4). CHAPTER 4: 7. 51 When do we use sample correlation? (4) What is the formula for R? (2) How does it differ from the formula for B? (4) Provide an intuitive explanation for the differences between the two formulae. (2) What is the relationship between R and the B in Y = A + BX and the B' in X = A' + B'Y? (4) What is the intuitive explanation for this? (4) What does R measure? (2) If R is 1 what does it tell you about the slope of the line? (2) Why does regression tell you more? (2)
© Copyright 2026 Paperzz