Lab 2: Replication of Fish (2002), "Islam and authoritarianism"

Comparative Government/Political Analysis II
Lab 2: Replication of Fish (2002),
"Islam and authoritarianism"
Michaelmas 2015
1. Pre-lab assignment
Fish (2002) argues that Muslim countries are “democratic underachievers”, but offers a different explanation
for this than other scholars.
•
•
•
•
What are two of the explanations that Fish (2002) discounts?
On what basis does he rule them out?
What explanation does he favor?
What is the causal mechanism underpinning this explanation?
2. Loading and examining the data
Load the data as follows:
r elf+growth+britcol+postcom+opec
This is not exactly the data Fish used in his original article (it appears to be missing a few cases), but as you
will see it yields very similar results.
First let’s take a look at the data, by double-clicking on the fish object in your Data window (upper right)
or typing View(fish) in the console.
Here is a guide to the key variables:
Variable name
Definition
country
fhrev
muslim
income
elf
growth
britcol
postcom
opec
sexRatio
Abbreviation for country name
Freedom House freedom rating, 1991-92 to 2000-2001 ten-year average (7=most free, 1=least free)
Indicator for Muslim country
Measure of economic development: log GDP per capita in 1990
Ethnolinguistic fractionalization (0=most uniform, 1=most diverse)
Economic growth (avg annual growth, 1975-1998)
Indicator for former British colony
Indicator for Communist heritage
Indicator for OPEC member
Population sex ratio
You know some other commands for exploring the data as well: head(), dim(), str().
• What are the units of the fish dataset? i.e., what does each row describe?
• How many countries are there in the fish dataset? How many countries are there in the original article
(check Table 1)?
1
• Try plotting a few pairs of variables (reminder: plot(x, y)) to see if the relationship makes sense.
Which of these plots are useful?
– income (horizontal axis) and fhrev (vertical axis)
– muslim (horizontal) and fhrev (vertical)
– muslim (horizontal) and britcol (vertical)
• For categorical variables, the table() command may be more useful: try table(fish$muslim,
fish$britcol).
3 Replicating summary statistics
Table 1 in Fish (2002) reports some summary statistics separately for Muslim and non-Muslim countries.
First we’ll try to replicate a few numbers:
• Using the mean() function and square brackets to subset your variables (e.g. fish$elf[fish$britcol
== 1), calculate the average Freedom House rating for Muslim and non-Muslim countries in the dataset.
Are your numbers close to those in the first row of Table 1?
• Do the same for the sociocultural division variable. Does it match the numbers in row 4 of Table 1?
• Use the table() function to try to replicate the “Communist heritage” numbers in row 7 of Table 1.
Now for some interpretation:
• What does Table 1 say about how much difference in the level of democracy (measured by the Freedom
House score) there is between Muslim and non-Muslim countries?
• What does Table 1 suggest might be possible causes of these differences?
4. Replicating Table 2: bivariate regressions
Each row of Table 2 reports the coefficient, significance level, adjusted R2 , and number of observations for a
different bivariate regression (i.e. a regression with just one independent variable).
• First, replicate the first regression. (Recall the procedure: store the output of lm(depvar ~ indvar,
data) in a variable; then use the summary() function on that variable to see the regression output.)
Are your results similar to what is reported in the first row of the table?
• Interpret the coefficient on muslim in this regression – what does it tell us about the relationship
between democracy and Islamic tradition?
• Interpret the (Intercept) coefficient in this regression – what does it tell us?
• What is the relationship between the coefficient in the first row of Table 2 and the means reported in
the first row of Table 1?
Now we’ll make some figures to visualize the regression analysis.
• First, produce the scatterplot showing the relationship between economic development and Freedom
House scores in the dataset. (Use the plot() command.)
• Now superimpose the regression line for this relationship. (Hint: run the regression, storing the output
e.g. in a variable called lm.out; then abline(lm.out).)
• If time: Do the same for other regressions in Table 2.
2
5. Replicating Table 3: multivariate regression
Each column of Table 3 reports the coefficients, standard errors, adjusted R2 , and number of observations for
a different multivariate regression (i.e. a regression with multiple independent variables).
• Replicate Models 4 and 5 of Table 3. What explains the difference in the coefficient on muslim between
the two models?
• Replicate Model 1, and interpret the coefficient on muslim.
6. Logistic regression version
In the week 3 lecture you learned about logistic regression, in which we model the log-odds of some event
occurring as a linear function of a set of variables.
As a reminder, a logistic regression model looks like:
Pr(y = 1)
log
= a + b1 x1 + b1 x2 + . . . + bm xm ,
1 − Pr(y = 1)
where log
Pr(y=1) 1−Pr(y=1)
is the “log odds”, also known as the “logit” function.
In this section we will convert the outcome in the Fish (2002) analysis into a binary (0-1) variable and
compare the OLS regression (which we run using lm()) to the logistic regression (which we run using glm()).
First, create a new variable fhdummy, which is 1 if the Freedom House score for a country is 4 or higher (and
thus it is a democracy) and otherwise 0.
Here are two ways to do this: “‘
fish$fhdummy <- as.integer(fish$fhrev >= 4)
fish$fhdummy <- ifelse(fish$fhrev >= 4, 1, 0)
Check that this works by tabulating the new variable against fhrev.
Now, run a “linear probability model” (an OLS regression with a binary outcome) using fhdummy as the
dependent variable and muslim, income, and opec as the independent variables (i.e. the same ones as Model
4 of Table 3). Store the reuslts in an object called mod.lpm.
• What does the coefficient on muslim mean in this model?
• predict(mod.lpm) will give you the predicted values for each country in the dataset under this model.
Store these in an object called lpm_predict and summarize them. What is the maximum predicted
value? What does it mean?
Now we will try a logistic regression. Whereas the syntax for OLS is lm(formula, data), the syntax
for logistic regression is glm(formula, family=binomial, data). Store the results in an object called
mod.logit.
• What does the coefficient on muslim mean in this model?
• predict(mod.logit, type = "response") will give you the predicted probabilities for each country
in the dataset under this model. Store these in an object called logit_predict and summarize them.
What is the maximum predicted value? What does it mean?
3
7. Role of female subordination
Fish’s explanation for the negative correlation between Islamic religious tradition and Freedom House scores
focuses on the problem of female subordination. Empirically, his goal in this regard is to provide evidence
that: (1) women fare less well in Muslim countries; and (2) this accounts for part of the link between Islam
and authoritarianism.
In this section we will replicate his analysis using the population sex ratio as an indicator of the status of
women. Note that we do not have data on sex ratios for all of the observations that Fish analyses.
First let’s look at the difference in the status of women in Muslim countries:
• Replicate Model 4 in Table 9 (note that the dependent variable here is Sex Ratio, not democracy).
• You’ll find that the results are quite different from those in the table. This is because Fish excludes the
extreme outliers on the Sex Ratio variable. If you plot the values for Sex Ratio you’ll see any outliers
quite clearly (plot(fish$sexRatio))
• Try replicating Model 4 from Table 9 again, excluding the extreme outlier (which is UAE). You can do
this by adding the subset command to the end of the regression model (lm(formula, data, subset
= sexRatio<200)).
• How do the results look now?
• What does the coefficient on muslim tell us?
Now let’s look at what this means for democracy:
• Replicate Models 1 and 4 in Table 10 (note the dependent variable here).
• What does the coefficient on sexRatio tell us?
• How can we explain the difference in the coefficients on muslim across the two models?
4