Analysis of Environmental Data Problem Set Conceptual Foundations: Fre q u e n tis t In fe re n c e : Maxim u m Like lih o o d 1. Consider a hypothetical study of moose movement patterns in relation to development intensity in Massachusetts. Let’s say that you track 10 moose using GPS telemetry and record the geographic location of each moose daily over the course of a season. You are interested in knowing whether the daily movement distance (i.e., Euclidean distance between daily locations) varies among moose in relation to the intensity of human development in the neighborhood. Let’s say that you have 100 observations per moose, representing a 100 day period. For our purposes, to keep it simple, let’s say that you randomly draw 1 observation per moose to ensure independence among observations. Each observation represents a 24 hour period. The raw data are given here for each moose, including an index of development intensity (dev.intensity) in the neighborhood of the moose during the 24 hour observation period and the Euclidean distance (dist.moved) during the corresponding 24 hour period. A histogram of dist.moved is shown in the accompanying figure. Obs dev.intensity dist.moved 1 2.9 456 2 8.5 141 3 7.0 47 4 1.3 1362 5 9.7 128 6 7.5 21 7 0.4 2123 8 6.2 189 9 0.9 899 10 3.8 38 1a. To begin, let’s establish the following null hypothesis: Hnull: daily movement distance decreases exponentially as given by an exponential error distribution. Note, the null model is an error model only (i.e., there is no deterministic component), which says that the distribution of moose daily movement distances is pure error arising from an exponential error model. Note, the exponential error distribution is a continuous probability distribution that describes the distribution of waiting times for a single event to happen, given that there is a constant probability per unit time that it will happen. However, it is commonly used to describe random samples of anything that decreases exponentially with time or distance. The figure below depicts the exponential probability density function for varying values of the single parameter lambda ( ë) (also called “rate”), which determines the rate of exponential decrease. Maximum Likelihood Inference: Problem Set 2 The corresponding statistical model is as follows: Your problem as a class is to calculate the maximum likelihood estimate of lambda for the moose dataset. To accomplish this, each team of students will be required to compute the negative log-likelihood for a given value of lambda, as follows: group1 – 0.0005 group2 – 0.001 group3 – 0.0015 group4 – 0.002 group5 – 0.0025 group6 – 0.003 group7 – 0.0035 group8 – 0.004 To do this you will need to know the probability density function for the exponential distribution, which is as follows: Recall that the probability density function gives the probability of any particular value of y given lambda (ë). Suggested steps: 1. 2. 3. 4. 5. First, figure out what the random variable y is in this case? Next, determine how to calculate the likelihood of a single observation. Next, using a hand calculator or a spreadsheet, calculate the likelihood of each observation. Next, calculate the log-likelihood and then take the negative of each observation. Lastly, sum the negative log-likelihoods of all observations. What does this negative log-likelihood for the dataset represent? Answer: To find the solution for any of the specified values of lambda, you need to recognize that the likelihood of an observation given a value of the model parameter(s) is obtained from the corresponding probability distribution. In this case, this means that the likelihood of each observation is derived by plugging in the assigned value of lambda and each value of y (dist.moved) into the exponential probability density function. The product of these likelihoods is the likelihood of the entire dataset. If we instead take the natural log of each likelihood and sum them, we get the log-likelihood of the dataset. And if we take the negative of the log-likelihood sum, we get the negative log-likelihood Maximum Likelihood Inference: Problem Set 3 of the dataset. The negative log-likelihoods of the dataset for the different values of lambda are plotted here as a (negative log-) likelihood curve. Note, this is a plot of the (negative log-) likelihood function because it depicts the (negative log-) likelihood of the data given varying values of the rate parameter lambda. Importantly, this is not the same as a probability distribution because the likelihoods integrated over all possible values of the parameter is not necessarily finite or equal to 1, which are requirements of a probability distribution. The maximum likelihood estimate of lambda is the value that maximizes the likelihood function or, equivalently, minimizes the negative log-likelihood function shown here. This value of lambda, 0.002, is the value that makes our data the most frequently occurring outcome under hypothetical repeated sampling – which is the goal of parameter estimation in the parametric frequentist inference framework. Now let’s plot what these fits look like on our original histogram. Specifically, let’s plot the exponential probability distribution for each of the values of lambda that we tried and highlight the one that gave us the best fit – the maximum likelihood estimate. 1b. Now let’s consider the original question regarding the relationship between daily movement distance and development intensity. First, let’s plot the relationship between the independent variable, development intensity, and the dependent variable, distance moved, as shown in the figure below. Based on this plot, let’s say that you propose the following alternative hypothesis to the null: Halt: daily movement distance decreases with increasing development intensity according to a power law, with exponentially distributed errors. Maximum Likelihood Inference: Problem Set 4 Note, here you have specified a statistical model that includes both a deterministic component, the power law relationship between development intensity and distance moved, and a stochastic component, the exponential error distribution. The corresponding statistical model is as follows: Note that lambda has been replaced with 1 divided by the power function. This is because the mean of the exponential distribution is equal to 1divided by lambda, which means lambda equals 1/mean, and the mean is given by the power function. Make sure you understand how to interpret this statistical model. Your problem as a class is to calculate the maximum likelihood estimate of the exponent b in the power law that describes the rate at which distance moved decreases with increasing development intensity, given the parameter a=1,000. To accomplish this, each team of students will be required to compute the negative log-likelihood for a given value of b, given a=1,000, as follows: group1: group2: group3: group4: group5: group6: group7: group8: -0.7 -0.8 -0.9 -1.0 -1.1 -1.2 -1.3 -1.4 Suggested steps: 1. First, figure out what the dependent variable y is and the single independent variable x is in this case? 2. Next, determine how to calculate the likelihood of a single observation. Note, here you will need to determine how to combine the exponential probability density function with the deterministic function. Remember, the deterministic function is modeling the mean. 3. Next, using a hand calculator or a spreadsheet, calculate the likelihood of each observation. 4. Next, calculate the log-likelihood and then take the negative of each observation. 5. Lastly, sum the negative log-likelihoods of all observations. 6. Report your final result to the instructor to determine if your value of b results in the minimum negative log-likelihood, and is therefore the “best” estimate of b. Maximum Likelihood Inference: Problem Set 5 Answer: To find the solution for any of the specified values of the parameter b, you need to recognize how the deterministic function relates to the stochastic function. Specifically, you need to recognize that the deterministic function, given here by the power function, is used to compute the predicted, expected or mean value of y, dist.moved, for any given value of x, dev.intensity. Thus, for each observation you get a different mean (or expected value). Now with this mean, you need to compute the likelihood of each observation based on the exponential error distribution. The trick here is to recognize that the mean as computed from the deterministic function is not equal to lambda, but instead is equal to 1/lambda. Thus, by simple algebra, lambda is equal to 1/mean. So, for each observation, you need to compute the mean (or expected value) from the power function, then take 1 divided by that value to convert it to lambda, and then compute the likelihood of the observation as before. So, for each observation you will have a different lambda and a different y, and you simply compute the likelihood using the exponential probability density function. The only difference between this exercise and the previous based on the null model is that before you used a single value of lambda, whereas now you are letting lambda vary with each observation based on the expected relationship of y, dis.moved, to x, dev.intensity. The negative log-likelihoods are plotted here against the parameter b. Note, this is a plot of the (negative log-) likelihood function because it depicts the (negative log-) likelihood of the data given varying values of the parameter b, which is the exponent of the power function describing the decrease in movement distance as development intensity increases. Again, make sure you understand why this is not the same as a probability distribution. The maximum likelihood estimate of b is the value that maximizes the likelihood function or, equivalently, minimizes the negative loglikelihood function shown here. This value of b, -1.1, is the value that makes our data the most frequently occurring outcome under hypothetical repeated sampling. Now let’s plot what these fits look like on our original scatterplot. Specifically, let’s plot the power function for each of the values of b that we tried and highlight the one that gave us the best fit – the maximum likelihood estimate. Remember, the best fitted line is our maximum likelihood estimate of how mean daily distance moved decreases as a power function of increasing development intensity. Maximum Likelihood Inference: Problem Set 6 EXTRA CREDIT 1c. For extra credit (worth 2 points), test the null hypothesis that distance moved is independent of development intensity. Specifically, calculate the p-value for this test and interpret it in terms of whether to accept or reject the null hypothesis. Note, to complete this, you will need to refer to the lecture notes. In addition, to calculate an exact p-value requires that you use a rather ugly mathematical formula or compute it directly in R. If you are in lab, then use R to compute the exact p-value. If you are not in lab, then you can use published tables (available on line) to get an approximate p-value. Either will suffice in this case. Answer: To answer this, you need to recall that any two nested statistical models can be tested with a Likelihood Ratio test based on the Deviance statistic. In this case, the null model is nested within the alternative model because it is just a simplification of the more complex model; in other words, the null model can be derived from the more complex model by simply removing the power function and letting lambda be a constant (this is equivalent to setting the b parameter of the power function to zero). Recall that twice the negative log of the likelihood ratio of the nested models, deviance, is approximately Chi-square distributed with r (difference between models in number of parameters) degrees of freedom: In this case, the restricted model is the null model and the alternative model is the more complex power-exponential model. Given the results from questions 1a and 1b, the deviance equals 13.66 (2*(72.95-66.13)) and this is distributed Chi-square with one degree of freedom (since the null model has 1 parameter and the alternative model has 2). We can use the Chi-square cumulative probability distribution to determine the probability of observing a deviance statistic as large or larger than the one we observed if in fact the null model were true (i.e., true deviance = 0). Specifically, we need to compute the cumulative probability of our observed statistic and take the compliment to get the p-value, because this represents the area under the null probability distribution to the right of our observed value. The cumulative probability density function for the Chi-square is rather ugly, so we will not compute the p-value algebraically. An approximate p-value can be determined by using any published table of p-values for the Chi-square distribution. Alternative, using R we can calculate it directly as follows: Maximum Likelihood Inference: Problem Set 1-pchisq(13.66,df=1) = 0.0002 A deviance this large or larger would be expected almost never (p<0.0002) under the null model; i.e., if the constant exponential error model were true. So we can reject the null hypothesis in favor of the alternative power-exponential model as being a significantly better fit and conclude that distance moved is in fact explained well by population density according to a negative power law. 7
© Copyright 2026 Paperzz