Model

Simple Comparative Experiments – Introduction to ANOVA
Read Sections 3.2 – 3.4 in the text
Note: These notes were modified from lecture notes created by Tisha Hooks and Christopher Malone.
In the last set of notes we looked at the Tension Bond Strength example with both a sample of the data and the
entire data set. Another important feature to examine when analyzing a data set is the _________________ of
difference due to a particular factor.
Confidence Interval
A confidence interval can be used to ______________________ measure the amount of difference between two
factor levels. That is, we can use a confidence interval to measure the __________ effect of a factor across its levels.
Let’s look at the sample of 4 observations from the Tension Bond Strength example. Recall, the data values used:
To obtain a confidence interval in Minitab, choose Stat  ANOVA  General Linear Model  Fit General Linear
Model and then enter the variables as follows:
Next, choose Stat  ANOVA  General Linear Model  Comparisons and then specify the following:
1
Note, you need to double-click on “Group” so that it gets highlighted as shown above. Next, click on the Results…
tab and make sure both boxes are checked as shown below. By default 95% confidence intervals will be given. If you
want to change the confidence level, click on the Options… tab and enter a new level.
Click OK twice and you should get the following output. The output is divided into two section: one for the
confidence interval and one for the hypothesis test.
The following output is given in the output window.
In addition, a second output window should open with the following:
2
Confidence Interval
The 95% confidence interval for the difference in factor level means is circled in the output above. Additionally, the
confidence interval is given graphically in the second output window.
Questions:
1. Identify the 95% confidence interval.
2. Interpret the 95% confidence interval.
3. Does it make sense this confidence interval includes 0? Explain your reasoning.
4. Sketch a confidence interval that would suggest the tension bond strength of the unmodified group is
statistical higher than the tension bond strength of the modified group.
3
Hypothesis Test
We might also be interested in conducting a hypothesis test to see if there is a significant difference between the
factor levels. First, let’s write out what is being tested.
H0:
Ha:
The test statistic and p-value for the hypothesis test can be found in the output given above.
Questions:
5. What is the outcome of this test?
6. Note that the adjusted p-value here is exactly the same as the p-value we found in the last set of notes. Why
are these two tests identical?
Another way to compare groups
Towards the top of the output, you’ll see the output given above. This is another way to determine whether or not
the groups are significantly different from one another. If the two groups have _________________ letters, then
they are statistically different from one another. If they have the same letter (as we see here) then the groups are
not statistically different from one another. This part of the output is useful when comparing more than two groups.
4
Example Revisited: Now, let’s take another look at the complete data set for this example found in the file
cement_mortar.mpj on the course website.
Questions:
7. Based on the ANOVA conducted in the last set of notes, would you expect this confidence interval to contain
0 or not? Explain your reasoning.
8. Using Minitab, find and interpret the 95% confidence interval.
9. Verify the F-statistic from the ANOVA is equivalent to the t-statistic found in the comparisons output.
5
We’ve just examined the Tension Bond Strength data carrying out the ANOVA while intuitively obtaining the
measures of error by computing the sums of squares “by hand.” These calculations are made much simpler using the
framework for a _______________________________________________________________________ (GLM). The
GLM approach does require the use of matrices and some linear algebra operations.
The Model
In general, we wish to compare _____ different levels of a single factor. Also, there are ______ observations under
each factor level. One way to write the statistical model for the Tension Bond Strength example is given below.
yij = µ + τi + εij
where i = 1, 2 identifies the _________________ and j = 1, 2, identifies the _________________
Let’s identify the meaning of each term in the model.
yij:
µ:
τi:
εij:
After the data are collected and used to estimate the model terms (parameters), statisticians typically place a “hat”
over the model terms to indicate they have been __________________ from the data. For example, the observed
overall mean of the response is denoted by ______ instead of ______.
Using your intuition, sketch the estimated model parameters on the dotplot below for our data.
yˆ ij = μˆ + ˆτi + εˆij
6
Questions:
10. Using the model parameters, what is the mean of the modified group?
11. Using the model parameters, what is the mean of the unmodified group?
The Model in Matrix Notation
We can look at our statistical model in matrix notation. The model for our simple example is given in matrix notation
below.
 y11  μ   τ1   ε11 
 y  μ   τ  ε 
 12       1    12 
 y 21  μ   τ 2  ε21 
       
 y 22  μ   τ 2  ε22 
The above model is equivalent to the one given below.
 y11  1
 y  1
 12   
 y21  1
  
 y22  1
Y
1 0
ε11 
μ   

1 0    ε12 
* τ 
0 1   1  ε21 
 τ   
0 1   2  ε22 
X
Using the above notation, Y is the _________________ vector and X is the __________________ matrix. The
estimated model parameters can be obtained as follows:
Model Estimates = _______________________________
Note: This approach to estimating the model parameters is _____________ general and in fact works for any linear
model.
7
Let’s return to our example:
1
1
X= 
1

1
1 0
1 0 
and Y =
0 1

0 1
16.52 
16.40 

.
16.62 


16.75
Problem: The columns of X’X are NOT linearly independent, therefore the inverse CANNOT be computed.
Solution: We need to re-parameterize the model so the model parameters can be estimated. Consider the following
re-parameterization.
 y11  1
 y  1
 12   
 y 21  1
  
 y 22  1
Y
1
 ε11 

1   μ  ε12 
*

0   τ1  ε21 

 
0
ε22 
X
8
This re-parameterization uses the ________________________ restriction where τ2 = ______. Let’s use this
parameterization to estimate the model parameters where
1
1
X= 
1

1
1
1 
and Y =
0

0
16.52 
16.40 

.
16.62 


16.75
Let’s identify the following values:
μ̂ = _____________
τ̂ 1 = _____________
τ̂ 2 = _____________
Estimated mean for the modified group = _________________________________
Estimated mean for the unmodified group = _________________________________
9
Another possible parameterization uses the ___________________________ restriction where _________________.
For this parameterization, we will use the following model:
 y11  1 1 
ε11 
 y  1 1  μ
 
 12   
 *    ε12  .
 y21  1 1  τ1  ε21 
  

 
 y 22  1 1
ε22 
Y
X
Therefore, to estimate the model parameters we’ll need
1 1 
1 1 
 and Y =
X= 
1 1


1 1
16.52 
16.40 

.
16.62 


16.75
Let’s identify the following values:
μ̂ = _____________
τ̂ 1 = _____________
τ̂ 2 = _____________
Estimated mean for the modified group = _________________________________
Estimated mean for the unmodified group = _________________________________
10
Using Minitab to obtain parameter estimates
We can obtain the above estimates using Minitab. Choose Stat  ANOVA  General Linear Model. Then click on
Results… shown in the figure below.
Click OK twice and you should get the following output.
Question:
12. What parameterization does Minitab use?
We can also obtain information about the group means by selecting Options… and entering the following
information in the dialogue box. Once you click OK twice, the following output should be displayed.
11
Error via the General Linear Model Approach
Define ŷ as the ________________ response vector. In our model, this vector simple contains the _______________
for each group. We can compute the predicted response vector for our data in the following manner:
1 1 
1 1  μˆ
*  =
ŷ = 
1 -1  ˆτ 1 


1 -1
The amount of error present in the model is simply the difference between the ___________________ vector and
the ___________________ response vector. Let’s compute the error for our small example.
 16.52  16.46  


16.40   16.46  
ˆ = 
=
Error = (y - y)
 16.62 16.685 
 
 
 
 16.75 16.685 
Now, to obtain SSE = Error’Error =
The average amount of ____________ error is σ̂2 
MSE
=
dferror
Thus, the average amount of _____________ is σ̂ = MSE =
12
Question:
13. Where is this quantity in the Minitab output?
The Standard Error of the Model Estimates
The estimate of σ̂2 is used in computing the standard errors for our parameter estimates. The standard error of the
model estimates are necessary for conducting ________________________ tests and ________________________
intervals. The standard error for our parameter estimates is obtained in the following manner:
Variance of parameter estimates =
Compute the variance for the parameter estimates in our example.
Next, let’s compute the standard error of the parameter estimates.
Standard error of parameter estimates =
13
Question:
14. Where is this information given in the Minitab output?
Note: If we want to test
H0: τ1 = 0
Ha: τ1 ≠ 0
When comparing two groups, the amount of difference between the two group averages is of interest.
Recall, the following information we’ve already found:
Average bond tension strength for the modified group: ____________________
Average bond tension strength for the unmodified group: ____________________
The difference in averages for the two groups: _________________________________________
The variance of the difference is found in the following manner:
Var  ˆτ2 - ˆτ1  =
14
Therefore, se  ˆτ2 - ˆτ1  =
Question:
15. Find this value in the Minitab output given below.
16. What are the hypotheses being tested?
17. How is the t-value computed?
18. How is the adjusted p-value computed?
19. What is the outcome of this test?
15
Confidence Intervals for a Difference in Means
Finally, let’s look at how a confidence interval for the difference in means in calculated.
Lower limit:  ˆτ2 - ˆτ1  - ____________________* Var  ˆτ2 - ˆτ1 
Center:  ˆτ2 - ˆτ1  =
Upper Limit:  ˆτ2 - ˆτ1  + ____________________* Var  ˆτ2 - ˆτ1 
This missing quantity in the interval is obtained from the t-distribution with df = __________.
To obtain this value in Minitab, choose Calc  Probability Distributions  t… and enter the values as shown below.
Question:
20. Verify the lower and upper limits of the confidence interval given in the Minitab output above.
16