Non-Monotonic Modification to the General Monotone Model

Non-Monotonic
Modification to the
General Monotone Model
Ashley Lawrence
University of Oklahoma
Rick Thomas
University of Oklahoma
OKJDM 4.12.2014
Michael Dougherty
University of Maryland
Researcher Degrees of Freedom
2
• Common statistical analyses provide researchers with
flexibility
• Which covariates to include
• Whether or not to transform the data
• How to handle outliers
• Simmons et al (2011) labeled these decisions
“researcher degrees of freedom” and showed how they
lead to inflated Type-I error rates
• Used an ANCOVA to show that participants were a yearand-a-half younger after listening to a particular song
• Selecting covariates among a number of variables
• Selecting independent variable
• Using a flexible sample size
Possible Solution
3
• To reduce use of researcher degrees of
freedom, researchers should use methods that
have less stringent assumptions
• General monotone model (GeMM) is one method
that makes few assumptions about the form of
the data
Overview
4
1. Propose a modification to the general
monotone model (GeMM) which allows it to
be applied to non-monotonic associations
2. Compare non-monotonic GeMM to other
methods of analyzing non-monotonic data
using Monte Carlo simulations
1. Linear and non-linear monotonic environments
2. Symmetric and asymmetric non-monotonic
environments with and without outliers
GeMM
5
• Semi-metric alternative to multiple regression
• Squared error replaced with rank-order inversion
• Model for GeMM
• 𝑌 = ∑𝛽𝑖 𝑋𝑖
• Parameter weights (𝛽𝑖 ) correspond to relative importance of 𝑋𝑖 to the
rank-order correspondence between Y and 𝑌
• Uses genetic algorithm (GA) to find weights that maximize a
metric of monotonic association between the Ys and 𝑌s, while
accounting for model complexity via the BIC-Tau
• Does assume that the data show a monotonic association
(Dougherty & Thomas, 2012)
Examples of Non-Monotonic
Associations
Yerkes & Dodson, 1908
Cepeda, Vul, Rohrer,
Wixted, & Pashler, 2008
6
Non-Monotonic GeMM
7
• Adds a reflection parameter for each predictor when needed
• Identify a point of reflection in the criterion
• Typically the maximum or minimum value
• Predictor values on one side of that point are reflected
120
120
100
Criterion
100
Criterion
80
60
40
-5
40
0
0
-10
60
20
20
-15
80
0
Predictor 1
5
10
15
0
5
10
Predictor 1
15
Non-Monotonic GeMM
8
• If the predictor and the criterion are actually monotonically
associated, non-monotonic GeMM will identify a maximum or
minimum value on the predictor as a point of reflection
3.5
3
Criterion
2.5
2
1.5
1
0.5
0
0
2
4
6
Predictor 2
8
10
12
Non-Monotonic GeMM
9
• Points of reflection for each predictor are found
using the GA
• BIC-Tau is augmented to take into account the
number of reflection parameters used in the
model
• No reflection parameters for predictors with 0 weight
• No penalty in BIC-Tau if the GA fails to identify a reflection point
• Weights indicate the relative importance of the
predictor in predicting the rank order on the
criterion
Non-Monotonic GeMM
10
• Designed to capture rank order
• Many researchers are interested in answering
questions about rank
• Makes few assumptions about the form of the
data
• Should be relatively invariant to extreme scores
and non-normality
• Requires the use of few researcher degrees of
freedom
Model Comparisons
11
• Non-monotonic GeMM (nmGeMM) was compared with
polynomial regression (k ≤ 2) and piecewise polynomial
spline (n ≤ 1,k ≤ 1)
• Power, Type-I error rate and predictive accuracy
• Monotonic environments
• Linear environment simulated from a continuous
multivariate distribution (𝑌 = .5𝑥1 + .3𝑥2 + .2𝑥3 + 0𝑥4 +
0𝑥5 + 0𝑥6 + 𝑒)
• Monotonic but non-linear environment used the above
equation with the criterion change to 2Y
Model Comparisons
12
• Non-monotonic environments
• Symmetric: 𝑌 = −1(.1𝑥1 + .9𝑥1 2 + .2𝑥2 + .7𝑥2 2 + .8𝑥3 + 0𝑥4 + 0𝑥5 +
0𝑥6 + 𝑒)
• Asymmetric:
𝑌 = .5 𝑥1 + 𝑥1 2 + 𝑥1 4 + .3 𝑥2 + 𝑥2 2 + 𝑥2 4 + .2 𝑥3 + 𝑥3 2 + 𝑥3 4 + 0𝑥4 +
0𝑥5 + 0𝑥6 + 𝑒
• e ~ N(0, 1)
• Five univariate outliers were also added to the data
• All simulations used N=100 for estimation and holdout samples
Linear Results
13
Power
Predictive Accuracy
1
0.8
0.6
0.4
0.2
0
GeMM
nmGeMM
PR
Spline
Tau
r
GeMM
.38
.54
nmGeMM
.37
.53
PR
.40
.59
PPS
.32
.47
• nmGeMM showed reduced power for weakest predictors when
compared to PR
• nmGeMM did not cross-validate as well as PR but better than
PPS
Monotonic but Nonlinear
14
Power
Predictive Accuracy
1.0
0.8
0.6
0.4
0.2
0.0
GeMM
nmGeMM
PR
Spline
Tau
r
GeMM
.40
.53
nmGeMM
.40
.52
PR
.39
.51
PPS
.29
.39
• nmGeMM showed better power and smaller Type-I errors rates
than the other methods
• nmGeMM had a slight advantage over PR and a large advantage
over PPS in predictive accuracy
Symmetric Non-Monotonic
15
PowerPower
for Reflection
Parameters
for Predictors
1
1
0.8
0.8
0.6
Predictive Accuracy
0.4
0.2
0
W1
L1
W2
L2
W3
L3
nmGeMM
nmGeMM
W4
L4
PR
PR
W5
L5
W6
L6
Tau
r
nmGeMM
.54
.76
PR
.53
.76
PPS
.58
.80
Spline
Spline
• nmGeMM had the worst power for non-monotonic predictors but the
best power for monotonic
• PPS had the best power for reflection parameters but also had the
most Type-I errors
• PPS cross-validated the best with nmGeMM showing a slight advantage
over PR
Asymmetric Non-Monotonic
16
PowerPower
for Reflection
Parameters
for Predictors
1
Predictive Accuracy
0.8
0.6
0.4
0.2
0
W1
L1
W2
L2
W3
L3
nmGeMM
W4
L4
PR
W5
L5
Spline
W6
L6
Tau
r
nmGeMM
.46
.61
PR
.32
.65
PPS
.45
.67
• PPS had the best power in general, followed by nmGeMM
• PPS also showed more Type-I errors for the reflection parameters
• nmGeMM performed best on predictive accuracy for tau and PPS
performed best for r
Effects of Outliers
17
• Symmetric data
• Outliers influenced the performance of the spline
• Worse power for monotonic predictor
• Increased Type-I error rates
• Poorer performance for predictive accuracy
• Asymmetric data
Predictive Accuracy
• No strong effect of outliers in this data environment
Tau
• Pattern of results did not change from the data without
outliers
nmGeMM
.50
.54
r
.71
.76
PR
.49
.53
.71
.76
PPS
.48
.58
.71
.80
Summary of Results
18
nmGeMM
Linear
Power
Monotonic
Symmetric
Asymmetric
Linear
Type-I error
Monotonic
Symmetric
Asymmetric
Linear
Predictive Accuracy
Monotonic
Symmetric
Asymmetric
PR
PPS
Conclusions
19
• If one knows that the data show a non-monotonic,
symmetric association, then PR or PPS could be
preferred
• If the structure of the data is unknown, then nmGeMM
may be preferred
• PR and PPS do not perform as well as nmGeMM when the data
are monotonically associated
• PR performs poorly in the asymmetric environment
• nmGeMM rarely showed the worst performance and often
showed the best
• If one is interested in estimating rank, nmGeMM may be
the best option
Importance of nmGeMM
20
• Provides researchers with a way to estimate rank when
the data are either monotonic or non-monotonic
• Gives researchers with a way to analyze data that
requires very few assumptions about the form of the
data
• Uses few researcher degrees of freedom
• PR and PPS require researchers to make a number of decisions
about the form of the data
• More research is needed to determine the effects of researcher
degrees of freedom on these methods
Thank you!
21