Economic Statistics: Stata Assignment 2 The correlation coefficient can be used in several ways. One is for testing whether a “meaningful relationship” exists between two variables. Another is for calculating the predicted relationship between two variables. Suppose we wanted to calculate how a vehicle’s weight affects its fuel efficiency. We might set up the equation as: Estimated MPG = a + b * weight We can do this for any pair of variables. The predicted relationship between some outcome y and some explanatory variable x is: Estimated y = a + b * x There are two formulas that we use to calculate the predicted values of a and b. b = rxy sy sx a = y − b⋅x Once we calculate these values, we have the relationship between the variables. Back to the issue of “meaningful relationships” or “statistically significant” relationships. Stata has two commands that will give you the correlation between variables. On their own, pwcorr x y and corr x y give the same output. (The first command stands for “pairwise correlation”.) However, they accept different options. I frequently find myself using these two variations on the basic comments: corr x y, cov to give the covariance between variables, and pwcorr x y, sig to give the “significance level” of a correlation; that is, the probability that we observed this relationship purely by chance. Formally, this probability is known as a “p-value”, and a low p-value is strong evidence of a true relationship. If you want some rather arbitrary guidelines for interpretation, here they are: 0.00 < p-value < 0.01 Extremely convincing evidence of relationship 0.01 < p-value < 0.05 Convincing evidence of a relationship 0.05 < p-value < 0.10 Suggestive of a relationship 0.10 < p-value < 0.99 Probably no relationship If economists were jurists, p-values less than 0.05 would constitute “beyond a reasonable doubt”. When we graph the relationship between two variables in Stata, we usually use the command graph twoway followed by additional instructions about the type of graph and the variables we use. Typically, the basic command looks like: graph twoway (graphtype ya yb yb x) We replace the word “graphtype” with the actual type of graph we’re using: scatter for a scatterplot, line for a line graph (like a time series), and so forth. You can read “help graph twoway” for more suggestions. The type of graph is always followed by a list of variable names. The last one is the independent variable on the horizontal axis. The others are all measured on the vertical axis. There can be one or several of these. For example, graph twoway (line le_male year) would produce a time series graph showing le_male over time, while graph twoway (line le_male le_female year) would produce a time series graph tracking both the variables le_male and le_female over time. We can also combine multiple graphs into the same image. Each graph is listed within a set of parentheses. Another way to show the time-series graphs for the variables le_male and le_female would be: graph twoway (line le_male year) (line le_female year) The advantage of this command is that we can combine different types of graphs. When I create a scatterplot, I usually like to add a graph of the predicted relationship between the two variables to the picture. In Stata, we can get the predicted relationship by using “lfit” (linear fit) as the graphtype. You could see this line on its own by typing: graph twoway (lfit y x) To show it on top of the scatterplot, you would do: graph twoway (scatter y x) (lfit y x) For each of the problems, do the following: a. Create a scatterplot with a line showing the relationship between the two variables. (Print this for me) b. Find the correlation between the two variables and its significance level. Does this suggest that a relationship is likely? (Write these answers on your assignment.) c. Find the mean and standard deviation of each variable, plus the covariance between the two variables. Use these to calculate the values of a and b in the prediction line y = a + b ⋅ x . (Write all of these numbers in your answer.) d. Calculate the prediction line using regression in Stata. Show me your regression output, and then write out the sentences: “The predicted relationship is y = 3.14 + 1.59 ⋅ x . In other words, when x increases by one, then y goes up by 1.59 on average.” (Of course, replace the values in the equation with the values that you calculate. Also replace the variable names, and units if necessary.) 1. Use the CPS data on workers and their earnings: the outcome (y variable) is salary, and the explanatory variable (x variable) is years of schooling. 2. Use the data on fuel efficiency: the outcome (y variable) is mpg, and the explanatory variable (x variable) is vehicle weight. 3. Use the data on Econ 400 students: the outcome (y variable) is height, and the explanatory variable (x variable) is age.
© Copyright 2026 Paperzz