Analysis of Environmental Data Problem Set

Analysis of Environmental Data Problem Set
Conceptual Foundations:
Fre q u e n tis t In fe re n c e : Maxim u m Like lih o o d
1. Consider a hypothetical study of moose movement patterns in relation to development intensity
in Massachusetts. Let’s say that you track 10 moose using GPS telemetry and record the geographic
location of each moose daily over the course of a season. You are interested in knowing whether the
daily movement distance (i.e., Euclidean distance between daily locations) varies among moose in
relation to the intensity of human development in the neighborhood. Let’s say that you have 100
observations per moose, representing a 100 day period. For our purposes, to keep it simple, let’s say
that you randomly draw 1 observation per moose to ensure independence among observations.
Each observation represents a 24 hour period.
The raw data are given here for each moose, including an index of development intensity
(dev.intensity) in the neighborhood of the moose during the 24 hour observation period and the
Euclidean distance (dist.moved) during the corresponding 24 hour period. A histogram of
dist.moved is shown in the accompanying figure.
Obs dev.intensity dist.moved
1
2.9
456
2
8.5
141
3
7.0
47
4
1.3
1362
5
9.7
128
6
7.5
21
7
0.4
2123
8
6.2
189
9
0.9
899
10
3.8
38
1a. To begin, let’s establish the following null
hypothesis:
Hnull: daily movement distance decreases exponentially as given by an exponential error distribution.
Note, the null model is an error model only (i.e., there is no deterministic component), which says
that the distribution of moose daily movement distances is pure error arising from an exponential
error model. Note, the exponential error distribution is a continuous probability distribution that
describes the distribution of waiting times for a single event to happen, given that there is a constant
probability per unit time that it will happen. However, it is commonly used to describe random
samples of anything that decreases exponentially with time or distance. The figure below depicts the
exponential probability density function for varying values of the single parameter lambda ( ë) (also
called “rate”), which determines the rate of exponential decrease.
Maximum Likelihood Inference: Problem Set
2
The corresponding statistical model is as follows:
Your problem as a class is to calculate the
maximum likelihood estimate of lambda for the
moose dataset. To accomplish this, each team of
students will be required to compute the negative
log-likelihood for a given value of lambda, as follows:
group1 – 0.0005
group2 – 0.001
group3 – 0.0015
group4 – 0.002
group5 – 0.0025
group6 – 0.003
group7 – 0.0035
group8 – 0.004
To do this you will need to know the probability density function for the exponential distribution,
which is as follows:
Recall that the probability density function gives the probability of any particular value of y given
lambda (ë).
Suggested steps:
1.
2.
3.
4.
5.
First, figure out what the random variable y is in this case?
Next, determine how to calculate the likelihood of a single observation.
Next, using a hand calculator or a spreadsheet, calculate the likelihood of each observation.
Next, calculate the log-likelihood and then take the negative of each observation.
Lastly, sum the negative log-likelihoods of all observations.
What does this negative log-likelihood for the dataset represent?
Answer: To find the solution for any of the specified values of lambda, you need to
recognize that the likelihood of an observation given a value of the model parameter(s)
is obtained from the corresponding probability distribution. In this case, this means that
the likelihood of each observation is derived by plugging in the assigned value of lambda
and each value of y (dist.moved) into the exponential probability density function. The
product of these likelihoods is the likelihood of the entire dataset. If we instead take the
natural log of each likelihood and sum them, we get the log-likelihood of the dataset.
And if we take the negative of the log-likelihood sum, we get the negative log-likelihood
Maximum Likelihood Inference: Problem Set
3
of the dataset.
The negative log-likelihoods of the dataset for the different values of lambda are plotted
here as a (negative log-) likelihood curve. Note, this is a plot of the (negative log-)
likelihood function because it depicts the (negative log-) likelihood of the data given
varying values of the rate parameter
lambda. Importantly, this is not the same
as a probability distribution because the
likelihoods integrated over all possible
values of the parameter is not necessarily
finite or equal to 1, which are requirements
of a probability distribution. The
maximum likelihood estimate of lambda is
the value that maximizes the likelihood
function or, equivalently, minimizes the
negative log-likelihood function shown
here. This value of lambda, 0.002, is the
value that makes our data the most
frequently occurring outcome under
hypothetical repeated sampling – which is
the goal of parameter estimation in the
parametric frequentist inference
framework.
Now let’s plot what these fits look like on
our original histogram. Specifically, let’s
plot the exponential probability
distribution for each of the values of
lambda that we tried and highlight the one
that gave us the best fit – the maximum
likelihood estimate.
1b. Now let’s consider the original question
regarding the relationship between daily
movement distance and development intensity.
First, let’s plot the relationship between the
independent variable, development intensity, and
the dependent variable, distance moved, as
shown in the figure below. Based on this plot,
let’s say that you propose the following
alternative hypothesis to the null:
Halt: daily movement distance decreases with increasing development intensity according to a power
law, with exponentially distributed errors.
Maximum Likelihood Inference: Problem Set
4
Note, here you have specified a statistical model
that includes both a deterministic component, the
power law relationship between development
intensity and distance moved, and a stochastic
component, the exponential error distribution.
The corresponding statistical model is as
follows:
Note that lambda has been replaced with 1
divided by the power function. This is because
the mean of the exponential distribution is equal
to 1divided by lambda, which means lambda
equals 1/mean, and the mean is given by the
power function. Make sure you understand how
to interpret this statistical model.
Your problem as a class is to calculate the maximum likelihood estimate of the exponent b in the
power law that describes the rate at which distance moved decreases with increasing development
intensity, given the parameter a=1,000. To accomplish this, each team of students will be required to
compute the negative log-likelihood for a given value of b, given a=1,000, as follows:
group1:
group2:
group3:
group4:
group5:
group6:
group7:
group8:
-0.7
-0.8
-0.9
-1.0
-1.1
-1.2
-1.3
-1.4
Suggested steps:
1. First, figure out what the dependent variable y is and the single independent variable x is in this
case?
2. Next, determine how to calculate the likelihood of a single observation. Note, here you will need
to determine how to combine the exponential probability density function with the deterministic
function. Remember, the deterministic function is modeling the mean.
3. Next, using a hand calculator or a spreadsheet, calculate the likelihood of each observation.
4. Next, calculate the log-likelihood and then take the negative of each observation.
5. Lastly, sum the negative log-likelihoods of all observations.
6. Report your final result to the instructor to determine if your value of b results in the minimum
negative log-likelihood, and is therefore the “best” estimate of b.
Maximum Likelihood Inference: Problem Set
5
Answer: To find the solution for any of the specified values of the parameter b, you need
to recognize how the deterministic function relates to the stochastic function.
Specifically, you need to recognize that the deterministic function, given here by the
power function, is used to compute the predicted, expected or mean value of y,
dist.moved, for any given value of x, dev.intensity. Thus, for each observation you get a
different mean (or expected value). Now with this mean, you need to compute the
likelihood of each observation based on the exponential error distribution. The trick
here is to recognize that the mean as computed from the deterministic function is not
equal to lambda, but instead is equal to 1/lambda. Thus, by simple algebra, lambda is
equal to 1/mean. So, for each observation, you need to compute the mean (or expected
value) from the power function, then take 1 divided by that value to convert it to
lambda, and then compute the likelihood of the observation as before. So, for each
observation you will have a different lambda and a different y, and you simply compute
the likelihood using the exponential probability density function. The only difference
between this exercise and the previous based on the null model is that before you used a
single value of lambda, whereas now you are letting lambda vary with each observation
based on the expected relationship of y, dis.moved, to x, dev.intensity.
The negative log-likelihoods are plotted
here against the parameter b. Note, this is
a plot of the (negative log-) likelihood
function because it depicts the (negative
log-) likelihood of the data given varying
values of the parameter b, which is the
exponent of the power function
describing the decrease in movement
distance as development intensity
increases. Again, make sure you
understand why this is not the same as a
probability distribution. The maximum
likelihood estimate of b is the value that
maximizes the likelihood function or,
equivalently, minimizes the negative loglikelihood function shown here. This
value of b, -1.1, is the value that makes
our data the most frequently occurring
outcome under hypothetical repeated sampling.
Now let’s plot what these fits look like on our original scatterplot. Specifically, let’s plot
the power function for each of the values of b that we tried and highlight the one that
gave us the best fit – the maximum likelihood estimate. Remember, the best fitted line is
our maximum likelihood estimate of how mean daily distance moved decreases as a
power function of increasing development intensity.
Maximum Likelihood Inference: Problem Set
6
EXTRA CREDIT
1c. For extra credit (worth 2 points), test
the null hypothesis that distance moved is
independent of development intensity.
Specifically, calculate the p-value for this
test and interpret it in terms of whether to
accept or reject the null hypothesis. Note,
to complete this, you will need to refer to
the lecture notes. In addition, to calculate
an exact p-value requires that you use a
rather ugly mathematical formula or
compute it directly in R. If you are in lab,
then use R to compute the exact p-value. If
you are not in lab, then you can use
published tables (available on line) to get an
approximate p-value. Either will suffice in
this case.
Answer: To answer this, you need to recall that any two nested statistical models can be
tested with a Likelihood Ratio test based on the Deviance statistic. In this case, the null
model is nested within the alternative model because it is just a simplification of the
more complex model; in other words, the null model can be derived from the more
complex model by simply removing the power function and letting lambda be a constant
(this is equivalent to setting the b parameter of the power function to zero). Recall that
twice the negative log of the likelihood ratio of the nested models, deviance, is
approximately Chi-square distributed with r (difference between models in number of
parameters) degrees of freedom:
In this case, the restricted model is the null model and the alternative model is the more
complex power-exponential model. Given the results from questions 1a and 1b, the
deviance equals 13.66 (2*(72.95-66.13)) and this is distributed Chi-square with one
degree of freedom (since the null model has 1 parameter and the alternative model has
2). We can use the Chi-square cumulative probability distribution to determine the
probability of observing a deviance statistic as large or larger than the one we observed if
in fact the null model were true (i.e., true deviance = 0). Specifically, we need to
compute the cumulative probability of our observed statistic and take the compliment to
get the p-value, because this represents the area under the null probability distribution
to the right of our observed value. The cumulative probability density function for the
Chi-square is rather ugly, so we will not compute the p-value algebraically. An
approximate p-value can be determined by using any published table of p-values for the
Chi-square distribution. Alternative, using R we can calculate it directly as follows:
Maximum Likelihood Inference: Problem Set
1-pchisq(13.66,df=1) = 0.0002
A deviance this large or larger would be expected almost never (p<0.0002) under the
null model; i.e., if the constant exponential error model were true. So we can reject the
null hypothesis in favor of the alternative power-exponential model as being a
significantly better fit and conclude that distance moved is in fact explained well by
population density according to a negative power law.
7