Slides for computer class 2

Before the class starts:
Login to a computer
Read the Data analysis assignment 1 on MyCourses
If you use Stata:
Start Stata
Start a new do file
Open the PDF
documentation about
regression
If you use RStudio:
Start RStudio
Start a new R script
Open R in Action, chapter 8
Maximum
likelihood
estimation
A better approach
Fit a curve instead of a line
• The example is the logit
function
1.2
Menarche
0.8
Interpretation stays the same:
• Expected value of
Menarche given Age
• i.e. probability
0.4
0.0
10
12
14
Age
16
18
Model
Linear regression model
y = β0 + β1x1 + β2x2 + … + βkxk + u
0.8
Menarche
Nonlinear regression model
y = g(β0 + β1x1 + β2x2 + … + βkxk) +u
g(x) = 1/(1+e-x)
1.2
0.4
Remarks
• The inverse of g(x), f(x) is called link
function
• f(x) = ln(x/1-x) is the logit function
• f(x) = x reduces to linear regression model
0.0
10
12
14
Age
Wooldridge, J. M. (2009). Introductory econometrics: a modern approach
(4th ed). Mason, OH: South Western, Cengage Learning., Section 17.1
16
18
Basic principle
Sample of 9 observations
Population has bernoulli
distribution
• Only 0 and 1
• Relative frequencies of 0
and 1 unknown
• The population is very large
The estimation principle:
• Find the relative frequency
that will maximize the
likelihood of the sample
Observed
value
Probability
Cumulative
Probability
0
?
?
0
?
?
0
?
?
0
?
?
0
?
?
0
?
?
0
?
?
1
?
?
1
?
?
Maximum likelihood
estimate
Likelihood
of the sample
Example
Menarche = g(-20.0 + 1.54 Age) + u
Age
Menarche
Fitted
p
ln(p)
Girl 1
13.6
1
73.6%
73.6%
-0.306
Girl 2
11.4
0
8.0%
92.0%
-0.083
Girl 3
12.6
1
35.2%
35.2%
-1.045
Girl 4
13.1
1
56.2%
56.2%
-0.576
Girl 5
12.6
0
34.6%
65.4%
-0.425
Girl 6
10.3
0
1.5%
98.5%
-0.015
Girl 7
10.2
0
1.3%
98.7%
-0.013
Girl 8
15.4
1
97.8%
97.8%
-0.022
Girl 9
15.2
1
96.9%
96.9%
-0.031
Girl 10
13.8
1
79.2%
79.2%
-0.233
Likelihood
(product)
Log-likelihood (sum)
6.4%
-2.749
Example data
A researcher is interested in how variables, such as
1. GRE (Graduate Record Exam scores),
2. GPA (grade point average) and
3. prestige of the undergraduate institution,
effect admission into graduate school. The response variable,
admit/don't admit, is a binary variable.
The variable rank takes on the values 1 through 4. Institutions
with a rank of 1 have the highest prestige, while those with a
rank of 4 have the lowest.
http://www.ats.ucla.edu/stat/stata/dae/logit.htm
Excel example
Normally distributed example
Cumulative
probability
density
0.205
0.185
0.407
0.084
1.588
0.16
0.013
-1.13
0.151
0.002
-0.08
0.386
0.001
0.132
0.405
0.0003
0.708
0.366
0.0001
-0.24
0.36
0.00004
1.984
0.085
0.000003
0.10
0.15
0.35
0.40
0.205
0.30
Probability
density
0.25
Observed
value
-0.897
0.05
Propability density
Sample of 9 observations
0.20
Population has normal
distribution
• Mean and SD are estimated
−1
0
Value
1
2
Likelihood
of the sample
Cumulative probability and probability
density
Excel example
Data analysis
assignment 2
Task
Do a moderation and a mediation analysis with a statistical
software of your choice using the approaches presented by
Baron and Kenny (1986) using the Prestige dataset used in the
class. Answer the following two research questions:
1. Are women dominated professions rewarded less for
prestigiousness than men dominated professions?
2. To what extent can the positive relationship between
education and income mediated by prestigiousness?
You can explain either income or if you see it necessary, the
logarithm of income.
How to get your analysis file started
Stata
RStudio
• Load the data following the • Load the data following the
instructions
instructions
• Explore the data using e.g. • Load the psych, car, effects,
describe, summarize,
and texreg packages by
inspect, codebook, graph
adding library command
matrix, and stem
to start of the R file. (If a
package is not found, you
need to install it)
• Explore the data using e.g.
describe, lowerCor,
corr.test, and
scatterplotMatrix
How to submit your answer
Stata
•
•
Set your working directory
Start your do file with
log using assingment1,
replace text
•
End your do file with
log close
•
After each graph add
graph export plotX.pdf
•
Open the Word document
template from MyCourses
Copy-paste the content of
assignment1.log to the document
template and insert the exported
figures into right places.
In word, write comments in
normal style and use headings
where appropriate
•
•
RStudio
• Compile a notebook in MS
Word format
• In word, write comments in
normal style and use
headings where appropriate
Are women dominated
professions rewarded
less for prestigiousness
than men dominated
professions?
Workflow of the analysis
1. Fit a model with direct effects only (done already as a part
of the previous assignment)
2. Add the interaction term to the model and compare the
models with a nested model test (F test)
3. Do an inteaction plot
4. Interpret the results paying particular attention to
interpreting effect sizes
Stata
RStudio
• Use nestreg, test , or
• Load the effects package
ftest for nested model test • Use anova for nested model
and margins and
test and effect and plot for
marginsplot for marginal
marginal effects
effects
To what extent can the
positive relationship
between education and
income mediated by
prestigiousness?
Workflow of the analysis
1.
2.
3.
4.
Fit a model of Y on X and controls
Fit a model of M on X and controls
Fit a model of Y on X, M, and controls
Calculate the sobel test
(http://quantpsy.org/sobel/sobel.htm)
5. Interpret the results paying particular attention to
interpreting effect sizes
Stata
• Use the user written
sgmediation command or
online calculator
RStudio
• Calculate sobel test
manually by calculating the
z statistic and testing it with
pnorm or use the online
calculator
Simulation
demonstration:
heteroskedasticity