Assignment 3 Inferential statistics

Assignment 3
Inferential statistics
Please answer the questions in a Word document and submit the completed
assignment by November 29, 2016 via email to Jaime Sebastián:
[email protected]. This assignment is worth 10% of your final grade.
1. You are studying the use of two different substrates for oil sands
remediation. You want to know if these substrates affect the growth of
aspen, for that you measured the height of aspens growing in both
substrates and performed a t-test with the data collected. You manually
calculated your t-value and got a result of 1.834, then you run the following
codes in R (italic):
qt(0.95, 14) = 1.761
qt(0.975, 14) = 2.144
a. What are the null and alternative hypotheses? (1 point)
b. Is the t-test one or two sided? With one or two samples? Why? (1
point)
c. Is aspen growth in these substrates significantly different? If it is, with
what level of confidence? (1 point)
2. Explain the difference between type I and II errors. Give an example where
one of the errors is preferred over the other one and explain why. (2 points)
3. In the macroinvertebrate study in the first assignment we wanted to know if
the introduction of rainbow trout affects macroinvertebrate density in water
bodies in Alberta. After the visual exploration, we performed an ANOVA and
this was the result. Explain what the numbers in the red circles mean. (2
points)
4. Why should we adjust when making multiple inferences? Mention one
method to do it. (1 point)
5. After the first result of the substrate study in the first question we wanted to
further explore the effect of both substrates under different moisture
conditions. Looking at the interaction plot below, what conclusions can you
make about the use of these substrates under different moisture conditions?
Is there any interaction? (2 points)
6. What statistical analysis would you perform for the following objectives?
Why? (2 points)
a. You want to know if poplar growth is affected by climate. You have
data for diameter increment and several yearly climate variables for
20 years.
b. You want to know if three salinity levels (low, medium and high) affect
pine seedling mortality.
c. You found out that the data you used for the t-test in question 1 is not
normal, what method would you use now?
d. You want to check if your data follows a normal distribution.
7. In forestry it is important to know the timber volume standing in a site to
apply a proper management. However, accurate measure of volume is only
possible after the tree is cut down. For that reason it is important to have
accurate models using variables that are easier to measure while the tree is
standing, like diameter. We used 100 trees to establish a relationship
between volume (cm3) and diameter at breast height (cm) registered in the
file Assignment3.csv.
a. Try a linear model with squared DBH, Volume = a + b*DBH2 (use
I(DBH^2) in R). Is it a good model? Does it meet the assumptions of
linear regression (Show figures to prove your statements when
possible)? How could we fix it? (2 points)
b. Try transforming the volume with a square root and run the model
again. Is it a good model? Does it meet the assumptions of linear
regression (Show figures to prove your statements)? How could we
fix it? (2 points)
c. Try a non-linear model of your choice. How did you choose it? (2
points)
d. Which is the best model? Why? (2 point)