Analysis of Environmental Data Problem Set

Analysis of Environmental Data Problem Set
Conceptual Foundations:
En v iro n m e n tal Data
Answers
1. For each of the following research questions, define a relevant “statistical” population. In doing
so, clearly define the experimental or observation units (i.e., sampling units) that might comprise
the statistical population. Note, there is often more than one viable alternative.
a. What is the relationship between bark beetle abundance and three-toed woodpecker
fecundity (# offspring/breeding female) in the Greater Yellowstone ecosystem over the
course of a bark beetle outbreak?
There are many possible answers. For example, individual breeding female threetoed woodpeckers are one logical observational unit, since both the dependent
variable (fecundity) and independent variable (bark beetle abundance) have the
potential to vary among individual female woodpeckers and their associated
territories, in which case the statistical population would be the entire biological
population of breeding female three-toed woodpeckers in the Greater
Yellowstone ecosystem during the period of study.
Alternatively, disturbance patches exhibiting distinct levels of beetle populations
(or associated tree damage) could be deemed the observational units, wherein
fecundity and beetle abundance would be measured at the disturbance patch
scale (e.g., inclusive of several breeding females), in which case the statistical
population would be the entire collection of disturbance patches comprising the
Greater Yellowstone ecosystem during the period of study.
And there are several other possibilities, such as watershed units in which the
statistical population is the set of all watersheds in the study area, management
compartments in which the statistical population is the set of all compartments in
the study area, or even the entire study area where the unit is the year of study
and the statistical population is the set of all years for the period of concern.
b. What is the relationship between the choice of green building design and educational
awareness of green building options among first time home builders in Massachusetts?
There are several possibilities, but the most logical observational unit would be
first time home builders, since both the dependent variable (choice of green
building design, however that response might be scaled) and independent
variable (green building awareness, however that variable might be scaled) have
the potential to vary among individual home builders, in which case the statistical
population would be all first time home builders in Massachusetts for the period
of study.
Environmental Data: Problem Set Answers
2
c. What is the spatial scale of heterogeneity in soil pH across the Mount Toby State Forest?
The logical observational unit would be a plot of some dimension (e.g., 1 m
quadrat), since the single variable (note that there is no distinction between
dependent and independent variables here, given the question as stated) has the
potential to vary among plots of any dimension, in which case the statistical
population would be the collection of all plots of specified dimension across
Mount Toby State Forest. Note, given the infinite possible plot dimensions, there
are infinite statistical populations that could be defined. The choice of plot size,
and thus the observation unit and statistical population, would depend on
environmental considerations (e.g., the scale of the ecological phenomena you are
ultimately interested in) and logistical/practical considerations related to field
data collection.
d. What are the factors affecting the probably of tree infestation by Asian Longhorn Beetle in
the city of Worcester, Massachusetts?
There are many possibilities, but a logical choice of observational unit would be
individual trees, since both the dependent variable (tree infestation, however that
response might be scaled, e.g., infected or not, percentage, etc.) and independent
variables (environmental factors, whatever they might be, e.g., tree diameter, and
however they might be scaled) have the potential to vary among individual trees,
in which case the statistical population would be the collection of all trees in the
city of Worcester during the period of study.
Alternatively, city streets or city blocks could be deemed the observational units,
wherein the dependent and independent variables would be measured at the
street or block scale, respectively (e.g., proportion of street trees infected), in
which case the statistical population would be the collection of all streets or
blocks in the city during the period of study.
e. What is the affect of watershed imperviousness on the base flow (annual low flow) of 3rd
order streams in southern New England?
A logical observational unit would be 3rd order watersheds, since both the
dependent variable (base flow) and independent variable (watershed
imperviousness) have the potential to vary among watersheds, in which case the
statistical population would be the collection of all 3rd order watersheds in
southern New England. Note, here the study design might involve measuring
base flow in each sampled watershed over a period of years and taking the
average (or minimum), in which case the annual measurements of base flow are
actually subsamples, since they will get averaged into a single value for each
observational unit (watershed).
Environmental Data: Problem Set Answers
3
f. What is the probability of a blandings turtle crossing a road in relation to road width and
traffic rate?
There are several possibilities, but a logical choice of observational unit would be
individual turtles, since both the dependent variable (crossing success) and
independent variables (road width and traffic rate) have the potential to vary
among individual turtles – assuming turtles are crossing different roads at
different times independently, in which case the statistical population would be
the collection of all blandings turtles.
Alternatively, sections of street could be deemed the observational units, wherein
the dependent and independent variables would be measured at the street scale
(e.g., proportion of attempts successful), in which case the statistical population
would be the collection of all street sections in the study area.
g. What is the affect of forest stand thinning level (e.g., residual tree basal area) on residual tree
growth?
There are several possibilities, but a logical choice of observational unit would be
the forest stand, since both the dependent variable (residual tree growth) and
independent variable (thinning level; i.e., residual tree basal area) have the
potential to vary among forest stands, in which case the statistical population
would be the collection of all forest stands in the study area. Note, in this case
tree growth might be measured on individual trees, but these would be
subsamples that get averaged to produce a value for the dependent variable for
each stand.
Alternatively, individual trees could be deemed the observational units, wherein
the dependent and independent variables would be measured at the tree scale, in
which case the statistical population would be the collection of all trees in the
study area. Note, in this case, trees are samples, not subsamples, because each
tree would be a separate observation in the subsequent statistical analysis.
h. What is the relationship between invertebrate community diversity (e.g., # taxa) in forested
wetlands and the level of human development within a 100-m radius?
A logical observational unit would be individual forested wetlands (patches),
since both the dependent variable (invertebrate community diversity) and
independent variable (level of human development) have the potential to vary
among wetland patches, in which case the statistical population would be the
collection of all forested wetland patches in the study area.
2. For each of the following research questions and associated data sets, determine the “type” of
dependent (response) data represented (i.e., continuous, count, proportion, binary, time to
death/failure, time series, circular). In addition, determine which, if any, are the dependent and
independent variables.
Environmental Data: Problem Set Answers
4
a. Does three-toed woodpecker hatching success rate vary in relation to bark beetle abundance
in the Greater Yellowstone Ecosystem? Data include: #eggs hatched, #eggs laid, and an
index of bark beetle abundance within the nesting territory for each of 100 nests observed
over the course of the study.
Type of data: proportional, because the count of #eggs hatched is out of a total
#eggs laid, making it a proportion. Note, with proportional data, there is always a
“trial” of some size (trial size, usually greater than 1) and the trial is the
observational unit, which helps to distinguish this from cross-classified
categorical data (see below).
Dependent variable: hatch success (expressed as a proportion, #eggs
hatched/#eggs laid).
Independent variable(s): bark beetle abundance, measured in some appropriate
way.
Observational unit: the individual nest or territory. Note, the nest has a trial size
equal to the number of eggs laid.
Statistical population: the collection of all nests or territories in the study area
(N=??).
Sample: the 100 nests sampled (n=100).
b. Is the wind direction on top of Mount Greylocks nonrandom? Data include wind direction
measured for 1,000 regularly spaced hours over the course of one year at a weather station
on top of Mount Greylocks.
Type of data: circular, because direction is measured in degrees which is a
circular variable because the two ends of the numerical continuum are actually
identical in meaning. Note, the question does not warrant treating this data as a
time series, but the data could be coerced into a time series format by first
transforming the wind direction (degrees) into a linear variable based on a
reference direction (we’ll discuss this in class) and then analyzing the temporal
pattern of variation in wind direction over time. However, this is not consistent
with the original question.
Dependent/Independent variable(s): there is only one measured variable here, so
there is no distinction between dependence and independence, which requires at
least two variables. Note, the overall study context may suggest that wind be
considered at least conceptually as either a dependent variable or independent
variable, but this will entirely depend on the question being considered. In the
context of the question as stated here, there is no distinction between dependent
and independent.
Observational unit: the hour.
Statistical population: the collection of all hours in the year (N=8,760).
Sample: the 1000 hours measured (n=1,000).
c. What is the dominant spatial scale of variability in soil pH in Mount Toby State Forest? Data
Environmental Data: Problem Set Answers
5
include soil pH measured at 1 m intervals along a 1 km transect across the study area.
Type of data: time series, because the measured variable (pH) is repeatedly
measured in a sequence, in this case a spatial sequence as opposed to a temporal
sequence, and the interest is in the pattern of variation in the measured variable.
Dependent/Independent variable(s): there is only one measured variable here, so
there is no distinction between dependence and independence, which requires at
least two variables. Note, the overall study context may suggest that pH be
considered, at least conceptually as either a dependent variable (e.g., responding
to plant cover) or independent variable (e.g., affecting plant growth), but this will
entirely depend on the question being considered. In the context of the question
as stated here, there is no distinction between dependent and independent.
Observational unit: the 1-m plot.
Statistical population: the collection of all 1-m plots along the 1 km transect
(N=1,000).
Sample: technically none, since all 1,000 1-m plots are measured. Note, in
practice, we might consider this one transect to be a sample of the Mount Toby
State Forest and thus the statistical population would be all 1-m plots in the
Forest (N=??) and the sample would be the 1,000 plots measured (n=1,000).
d. Is attitude towards motorized recreation independent of education level among users of
Mount Toby State Forest? Data include attitude class (favor motorized use, opposed to
motorized use, neutral to motorized use) and education level (high school, BS, MS, or PhD)
for each of 100 randomly surveyed visitors to Mount Toby State Forest.
Type of data: count data of the cross-classified categorical type, because the data
represent counts in each category of attitude class and education class (i.e., 12
categories derived by combining each level of attitude class with each level of
education class). Note, since we know the total number of visitors, it is tempting
to consider the counts in each class to be proportions, and thus the data to be
proportional. However, with cross-classified categorical data, there are no ‘trials’
and the observational units don’t correspond to ‘trials’ of some size. Here, the
observational unit (in this case, person) is simply classified into one of the
categories, resulting in a count for each category. Each observational unit
(person, in this case) doesn’t have a proportional response, they merely fall into
one of the categories, whereas with proportional data, each observational unit has
a proportional response. This is a subtle, but important distinction.
Dependent variable: counts in each category.
Independent variable(s): attitude class (categorical, with 3 levels) and education
level (categorical, with 4 levels).
Observational unit: the individual person or visitor.
Statistical population: the collection of all people or visitors to Mount Toby State
Forest (N=??).
Sample: the 100 visitors surveyed (n=100).
Environmental Data: Problem Set Answers
6
e. Do street trees confer greater home energy efficiency in Amherst, Massachusetts? Data
include presence/absence of street trees and a measure of home energy efficiency for each
of 100 homes in Amherst, Massachusetts, controlling for home age, size and construction.
Type of data: continuous, because the dependent variable, home energy
efficiency, is continuously scaled. Note, the binary presence/absence of street
trees is the independent variable, not the dependent data, so it does not
determine the “type” of dependent data.
Dependent variable: home energy efficiency (measured in some appropriate
way).
Independent variable(s): presence/absence of street trees.
Observational unit: the individual home.
Statistical population: the collection of all homes in Amherst, perhaps limited to
those with similar age, size and construction (N=??).
Sample: the 100 homes sampled (n=100).
f. What is the expected longevity of an artificial white pine snag (i.e., created by cutting the
crown off the tree) in western Massachusetts, controlling (i.e., restricting the sampling units
to a single value or narrow range of values such that the sampling units are treated as
effectively identical with respect to these controlled variables) for tree size, soil and slope
position? Data include age of the snag at the time of falling for 100 created snags on
Caldwell State Forest in Pelham, Massachusetts.
Type of data: time to death/failure, because the measured response is the time to
failure (snag fall).
Dependent/Independent variable(s): there is only one measured variable here,
time to snag fall, so there is no distinction between dependence and
independence, which requires at least two variables. Note, the overall study
context may suggest that snag longevity or survival rate be considered, at least
conceptually, as the dependent variable responding to the independent variables
tree size, soil and slope position, but since these independent variables are being
controlled for in this study, there is no variability among the observational units
(snags) with respect to these variables, and thus they are not technically
independent variables in the context of this study.
Observational unit: the individual snag.
Statistical population: the collection of all snags in Caldwell State Forest meeting
the requirements of size, soil and slope position (N=??).
Sample: the 100 snags sampled (n=100).
g. Is the number of eastern hemlock trees infested with the hemlock woolly adelgid affected by
latitude, longitude, slope aspect, topographic position, and/or site index? Data include the
number of infested trees on 100 1-hectare hemlock-dominated forested plots selected
randomly with respect to geographic, topographic and edaphic factors throughout the state
the Massachusetts.
Type of data: simple count, because there is an unbounded count of infested
Environmental Data: Problem Set Answers
7
trees on each 1 hectare plot. Note, we did not record the proportion of hemlock
trees on each plot that were infested, which would have been a much better
approach, so the data represent simple counts rather than proportional data.
Dependent variable: number of infested trees.
Independent variable(s): latitude, longitude, slope aspect, topographic position,
and site index, each measured separately for each 1 hectare plot.
Observational unit: 1-hectare plot of hemlock-dominated forest.
Statistical population: the full collection of all 1-hectare hemlock-dominated
forest plots that exist in Massachusetts. Note, this would be difficult to measured
but could be done so by developing a forest cover map using remote sensing
(N=??).
Sample: the 100 1-hectare forest plots sampled (n=100).
h. Is the likelihood of choosing a green building design affected by home owner awareness of
green building options in Massachusetts? Data include use of green building practices (yes or
no) and an index of green building awareness (from none to high derived from several
factors) for a random sample of 100 new home builders in the Pioneer Valley,
Massachusetts.
Type of data: binary, because the dependent variable (choice of building design)
can take on only 1 of 2 values: green building verus conventional building. Note,
here each home builder is the observational unit and they either choose green
building or not, thus the trial size = 1. Viewed this way, binary data represent the
special case of proportional data when trial size equals 1.
Dependent variable: binary choice of green building or conventional building.
Independent variable(s): home owner awareness of green building design
options, measured in some appropriate manner.
Observational unit: the individual new home builder. Note, the home builder is
the trial and it has a trial size of 1.
Statistical population: the collection of all new home builders in the Pioneer
Valley (N=??). Note, there is a mismatch between the desired scope of inference
in the question (Massachusetts) and the realized scope of inference from the
study design (Pioneer Valley). The statistical population is based on the realized
study, so here either the question needs to change in scope or the study design
needs to expand in scope.
Sample: the 100 new home builders (n=100).
3. Obtain a real data set from your field of study, either one that you collected or from your major
professor, and identify the type of data represented. If more than one type of data are included,
identify each type.
Good luck!