Correlation, Causation, and Control Variables Sociology 128 Fall 2013 There are three types of questions you might be trying to answer that would make it helpful to use control variables. (To ‘control’ only means that you are comparing results for all the units with the same value on that variable: you are comparing like with like.) Case 1: Spurious relationships. I have found a correlation between A and B. Is it spurious? That is, is something else causing A AND B? For instance: taller kids know more math, on average. Does getting taller increase math knowledge? (Maybe giving kids growth hormone would make them smarter!) No. Instead, increasing age causes both height and math knowledge to increase. The relationship between height and math scores is entirely spurious. height age math knowledge Relationship of height to test scores Height Test score Low High 3’6” 70% 30% 4’0” 40% 60% Relationship of height to test scores, controlling for age Height Age 6 Low High Low 3’6” 80% 20% 35% 4’0” 80% 20% 35% Age 8 High 65% 65% Case 2: Antecedent variables. I have found a correlation between A and B. What is the causal effect of A on B? (Note: this is logically the same as Case 1, but expressed differently.) The Stouffer reading is full of examples of this type. To answer this question, I need to know something more about the individuals (or firms, or states, or whatever) at the beginning. For instance, imagine we want to know about the risk of harm posed by various cancer treatments. If I tell you that the people who took Drug Q died more often than people who took Drug X, you would not immediately conclude that Drug Q is more dangerous. You might first like to know if the people who took Drug Q had less treatable cancers, while the people who took Drug X had early-stage cancers. For instance: students who take AP classes get better SAT test scores than students who don’t take AP classes. Do AP classes teach useful skills, and therefore have a causal effect on SAT scores? Or is the difference in SAT scores due to some boost in cognitive abilities that kids with highly educated parents get at home from a young age? Watts would notice that we can make a compelling story for either version. We have two competing views of the world. Parents’ education Taking AP classes Taking AP classes OR SAT scores SAT scores We will probably find that kids of highly educated parents are both more likely to take AP classes and to have high SAT scores, so this is a good candidate for an antecedent control variable. But when you chart out the relationship of AP classes to test scores, controlling for parents’ education, I bet you’d find that there is still a relationship between AP classes and SAT scores, as well as a relationship between parents’ education and SAT scores. If this is the real world, then: Parents’ education Taking AP classes SAT scores For an exercise, try making a table like the one in Case 1 that matches this story. Note again: Case 2 is logically similar to Case 1, just expressed differently. Both deal with antecedent variables. Age is logically prior to both height and test scores. Parents’ education is logically prior to both kids’ AP classes and SAT scores. If a relationship is spurious, then adding an antecedent variable makes the apparent relationship between A and B disappear. More often in the social world, the relationship is attenuated; often, the size of the relationship is smaller when you consider the control variable, but the effect may not completely go away. Antecedent variables are also called “pre-treatment” controls. Case 3: Intervening variables. I have found a correlation between A and B. I believe that A causes B, but perhaps not directly. What is the mechanism by which A causes B? For instance: I observe that children whose mothers have attended college are more fluent readers in elementary school, on average, than children whose mothers have a high school degree or less. But there is nothing about my mother having a piece of paper that certifies her college graduation that should make me a fluent reader: there must be other factors that intervene. Try listing possible intervening variables that might mediate this relationship. I bet you can come up with at least five. Here’s one possibility: Mother’s education Mother’s vocabulary size Child’s reading fluency There are large differences by education in the number and variety of words that children hear at home. If mother’s vocabulary is an intervening variable, then when I control for vocabulary size, the apparent effect of mother’s education should go away. In other words, children with mothers who use small vocabularies at home should have similar reading fluency, and children whose mothers use large vocabularies should have similar reading fluency. The difference is that a large percentage of mothers with large vocabularies have attended college. In this world, mother’s education does cause child’s reading fluency, but the link is explained by differences in mother’s vocabulary size. Try making the tables that match this story. In reality, vocabulary size is probably one of several intervening variables between mother’s education, so controlling for mother’s vocabulary size might partially explain the link between mother’s education and child’s reading fluency, but wouldn’t completely explain it: Mother’s education Other intervening variables Mother’s vocabulary size Child’s reading fluency Intervening variables are also called mediator variables. Researchers who are looking for mechanisms search for intervening variables. When do you stop? In most cases, you’d be able to carry on elaborating the model indefinitely. In Case 3, for instance, does mother’s college attendance actually explain her vocabulary size, or did she have a large vocabulary as a result of her childhood environment? What other intervening variables could we consider? How exactly does a mother’s vocabulary get translated into her child’s reading ability – are there other intervening variables in this stage of the model? The decision of when to stop is a judgment call on the part of the researcher, depending on what question they are trying to answer. An Eastern guru affirms that the earth is supported on the back of a tiger. When asked what supports the tiger, he says it stands upon an elephant; and when asked what supports the elephant he says it is a giant turtle. When asked, finally, what supports the giant turtle, he is briefly taken aback, but quickly replies, "Ah, after that it is turtles all the way down." (see Wikipedia, “Turtles all the way down.”)
© Copyright 2026 Paperzz