March 20, 2013 Correlation and Causation We must be very careful in interpreting correlation coefficients. Just because two variables are highly correlated does not mean that one causes the other. In statistical terms, we say that correlation does not imply causation. How are the following causal relationships flawed conclusions? (Or are they flawed?) There are many good examples of correlation which are nonsensical when interpreted in terms of causation. Example 1: Ice cream sales and the number of shark attacks on swimmers are positively correlated. Can I conclude that a rise in ice cream sales is going to cause more shark attacks? Of course ice cream does not cause shark attacks! Ice cream sales and shark attacks both increase during the summer. So, the two variables are positively correlated, but there is no causal relationship between the two! Example 3: Children with bigger feet spell better. Again, a child's shoe size and her ability to spell are both related to a child's age ... Children with bigger feet spell better because they are older, their greater age bringing about bigger feet and, not quite so certainly, better spelling. Thus the two variables are positively correlated and there is no causal relationship. Example 2: The number of cavities an elementary school child has had and the child's vocabulary size have a strong positive correlation. Can I conclude that if my niece gets a cavity, the cavity will cause her to increase her vocabulary? The number of cavities a child has had and the size of a child's vocabulary are both related to a child's age ... as a child gets older, the child is both more likely to get cavities and increase her vocabulary, thus the two variables are positively correlated. Again, there is no causal relationship. Example 4: In areas of the South, those counties with higher divorce rates generally have lower death rates. My sister lives in Tennessee. Can I conclude that if she gets divorced, her divorce will cause her to live longer? No. Again this example has to do with age & demographics. Couples who are older are less likely to get divorced and are more likely to die than are couples from counties with younger demographic profiles. Thus, the two variables are negatively correlated, and there is no causal relationship. March 20, 2013 Example 5: Joey did exceptionally poorly last semester, so I punished him. He did much better this semester. Clearly, punishment is effective in improving students' grades. Example 6: Nations that add fluoride to their water have a higher cancer rate than those that do not. Can I conclude that fluoride causes cancer? I highly doubt it! Often exceptional performances (exceptionally bad or exceptionally good) are followed by more "normal" performances, so the change in performance might better be explained by regression towards the mean. I don't think so! Nations that add fluoride to their water are generally wealthier and more health-conscious, and thus a greater percentage of their citizens live long enough to develop cancer, which is, to a large extent, a disease of old age. Thus, the two variables are positively correlated, and there is no causal relationship. Example 7: The more firemen fighting a fire, the more damage there is going to be. Therefore firemen cause damage. Example 8: The frequency of accidents on a road fell after a speed camera was installed. Therefore, the speed camera has improved road safety. Firemen don't cause damage. Firemen are sent according to the severity of the fire ... the larger the fire, the more firemen are sent. Large fires demand more firemen and large fires cause more damage. Probably not ... Speed cameras are often installed after a road incurs an exceptionally high number of accidents, and this value usually falls (regression to the mean) immediately afterwards. Example 9: Since the 1950s, both the atmospheric CO2 level and crime levels have increased sharply. Hence, atmospheric CO2 causes crime. Atmospheric CO2 does not cause crime. In fact, both the increase in atmospheric CO2 levels and the increase in crime levels are more likely to have been caused by an increase in population since the 1950s. Let's examine the possible relationships among two correlated variables: direct causation, common response, and confounding. These three relationships are the three relationships which can be taken (or mistaken) for causation! March 20, 2013 Direct Causation: changes in X cause changes in Y. For example, football weekends cause heavier traffic, more food sales, etc. Confounding: the effect of X on Y is hopelessly mixed up with the effects of other explanatory variables on Y. For example, if we are studying the effects of Tylenol on reducing pain, and we give a group of pain-sufferers Tylenol and record how much their pain is reduced, we are confounding the effect of giving them Tylenol with giving them any pill. Many people report a reduction in pain by simply being given a sugar pill with no medication in it at all, this is called the placebo effect. To establish causation, a designed experiment must be run. Common Response: both X and Y respond to changes in some unobserved variable. most of the examples we just looked at are examples of common response. · Ice cream sales and shark attacks both increase during summer. · The number of cavities and children's vocabulary are both related to a child's age. · Skirt lengths and stock prices are both controlled by the general attitude of the country, liberal or conservative. Statistics and Causation 1. A strong relationship between two variables does not always mean that changes in one variable cause changes in the other. 2. The relationship between two variables is often influenced by other variables lurking in the background. 3. The best evidence for causation comes from randomized comparative experiments. Statistics Section 6.2 - Correlation does NOT imply Causation 1. When I’m stressed, I get muscle cramps. However, when I’m stressed, I also drink lots of coffee and lose sleep. So it’s hard to tell whether my cramps are actually caused by coffee, lack of sleep, stress, or some combination of the above. “Lots of coffee” and “lack of sleep” are examples of: a) b) c) d) direct causation common response confounding regression towards the mean 2. A pro baseball player has an exceptionally poor batting performance during a night game. The very next day the batting coach spends several hours working with the player. The next game the same baseball player has a much better batting performance. The change in performance most likely can be explained by: a) b) c) d) direct causation common response confounding regression towards the mean 3. Chris runs home from school every day. Chris finds that when he runs faster, he gets home sooner. The change in travel time most likely can be explained by: a) b) c) d) direct causation common response confounding regression towards the mean 4. Surf board sales rise when lemonade sales rise. A conclusion about the causality in the preceding example is flawed because it is most likely a result of: a) b) c) d) direct causation common response confounding regression towards the mean
© Copyright 2026 Paperzz