Analysis of Messy Data VOLUME III: ANALYSIS OF COVARIANCE George A. Milliken Dallas E. Johnson CHAPMAN & HALL/CRC A CRC Pr ess Compan y Boca Raton London Ne w York Washington, D.C. C0317fm frame Page 4 Monday, July 16, 2001 7:52 AM Library of Congress Cataloging-in-Publication Data Milliken, George A., 1943– Analysis of messy data / George A. Milliken, Dallas E. Johnson. 2 v. : ill. ; 24 cm. Includes bibliographies and indexes. Contents: v. 1. Designed experiments -- v. 2. Nonreplicated experiments. Vol. 2 has imprint: New York : Van Nostrand Reinhold. ISBN 0-534-02713-X (v. 1) : $44.00 -- ISBN 0-442-24408-8 (v. 2) 1. Analysis of variance. 2. Experimental design. 3. Sampling (Statistics) I. Johnson, Dallas E., 1938– . II. Title. QA279 .M48 1984 519.5′352--dc19 84-000839 This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. Apart from any fair dealing for the purpose of research or private study, or criticism or review, as permitted under the UK Copyright Designs and Patents Act, 1988, this publication may not be reproduced, stored or transmitted, in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage or retrieval system, without the prior permission in writing of the publishers, or in the case of reprographic reproduction only in accordance with the terms of the licenses issued by the Copyright Licensing Agency in the UK, or in accordance with the terms of the license issued by the appropriate Reproduction Rights Organization outside the UK. The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific permission must be obtained in writing from CRC Press LLC for such copying. Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation, without intent to infringe. Visit the CRC Press Web site at www.crcpress.com © 2002 by Chapman & Hall/CRC No claim to original U.S. Government works International Standard Book Number 1-584-88083-X Library of Congress Card Number 84-000839 Printed in the United States of America 1 2 3 4 5 6 7 8 9 0 Printed on acid-free paper C0317fm frame Page 5 Monday, June 25, 2001 1:04 PM Table of Contents Chapter 1 Introduction to the Analysis of Covariance 1.1 Introduction 1.2 The Covariate Adjustment Process 1.3 A General AOC Model and the Basic Philosophy References Chapter 2 One-Way Analysis of Covariance — One Covariate in a Completely Randomized Design Structure 2.1 2.2 2.3 2.4 The Model Estimation Strategy for Determining the Form of the Model Comparing the Treatments or Regression Lines 2.4.1 Equal Slopes Model 2.4.2 Unequal Slopes Model-Covariate by Treatment Interaction 2.5 Confidence Bands about the Difference of Two Treatments 2.6 Summary of Strategies 2.7 Analysis of Covariance Computations via the SAS® System 2.7.1 Using PROC GLM and PROC MIXED 2.7.2 Using JMP® 2.8 Conclusions References Exercise Chapter 3 3.1 3.2 3.3 3.4 3.5 3.6 Examples: One-Way Analysis of Covariance — One Covariate in a Completely Randomized Design Structure Introduction Chocolate Candy — Equal Slopes 3.2.1 Analysis Using PROC GLM 3.2.2 Analysis Using PROC MIXED 3.2.3 Analysis Using JMP® Exercise Programs and Initial Resting Heart Rate — Unequal Slopes Effect of Diet on Cholesterol Level: An Exception to the Basic Analysis of Covariance Strategy Change from Base Line Analysis Using Effect of Diet on Cholesterol Level Data Shoe Tread Design Data for Exception to the Basic Strategy © 2002 by CRC Press LLC C0317fm frame Page 6 Monday, June 25, 2001 1:04 PM 3.7 Equal Slopes within Groups of Treatments and Unequal Slopes between Groups 3.8 Unequal Slopes and Equal Intercepts — Part 1 3.9 Unequal Slopes and Equal Intercepts — Part 2 References Exercises Chapter 4 Multiple Covariates in a One-Way Treatment Structure in a Completely Randomized Design Structure 4.1 4.2 4.3 4.4 4.5 Introduction The Model Estimation Example: Driving A Golf Ball with Different Shafts Example: Effect of Herbicides on the Yield of Soybeans — Three Covariates 4.6 Example: Models That Are Quadratic Functions of the Covariate 4.7 Example: Comparing Response Surface Models Reference Exercises Chapter 5 Two-Way Treatment Structure and Analysis of Covariance in a Completely Randomized Design Structure 5.1 5.2 5.3 Introduction The Model Using the SAS® System 5.3.1 Using PROC GLM and PROC MIXED 5.3.2 Using JMP® 5.4 Example: Average Daily Gains and Birth Weight — Common Slope 5.5 Example: Energy from Wood of Different Types of Trees — Some Unequal Slopes 5.6 Missing Treatment Combinations 5.7 Example: Two-Way Treatment Structure with Missing Cells 5.8 Extensions Reference Exercises Chapter 6 6.1 6.2 6.3 6.4 6.5 6.6 Beta-Hat Models Introduction The Beta-Hat Model and Analysis Testing Equality of Parameters Complex Treatment Structures Example: One-Way Treatment Structure Example: Two-Way Treatment Structure © 2002 by CRC Press LLC C0317fm frame Page 7 Monday, June 25, 2001 1:04 PM 6.7 Summary Exercises Chapter 7 Variable Selection in the Analysis of Covariance Model 7.1 Introduction 7.2 Procedure for Equal Slopes 7.3 Example: One-Way Treatment Structure with Equal Slopes Model 7.4 Some Theory 7.5 When Slopes are Possibly Unequal References Exercises Chapter 8 Comparing Models for Several Treatments 8.1 Introduction 8.2 Testing Equality of Models for a One-Way Treatment Structure 8.3 Comparing Models for a Two-Way Treatment Structure 8.4 Example: One-Way Treatment Structure with One Covariate 8.5 Example: One-Way Treatment Structure with Three Covariates 8.6 Example: Two-Way Treatment Structure with One Covariate 8.7 Discussion References Exercises Chapter 9 Two Treatments in a Randomized Complete Block Design Structure 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 Introduction Complete Block Designs Within Block Analysis Between Block Analysis Combining Within Block and Between Block Information Determining the Form of the Model Common Slope Model Comparing the Treatments 9.8.1 Equal Slopes Models 9.8.2 Unequal Slopes Model 9.9 Confidence Intervals about Differences of Two Regression Lines 9.9.1 Within Block Analysis 9.9.2 Combined Within Block and Between Block Analysis 9.10 Computations for Model 9.1 Using the SAS® System 9.11 Example: Effect of Drugs on Heart Rate 9.12 Summary References Exercises © 2002 by CRC Press LLC C0317fm frame Page 8 Monday, June 25, 2001 1:04 PM Chapter 10 More Than Two Treatments in a Blocked Design Structure 10.1 10.2 10.3 Introduction RCB Design Structure — Within and Between Block Information Incomplete Block Design Structure — Within and Between Block Information 10.4 Combining Between Block and Within Block Information 10.5 Example: Five Treatments in RCB Design Structure 10.6 Example: Balanced Incomplete Block Design Structure with Four Treatments 10.7 Example: Balanced Incomplete Block Design Structure with Four Treatments Using JMP® 10.8 Summary References Exercises Chapter 11 Covariate Measured on the Block in RCB and Incomplete Block Design Structures 11.1 11.2 11.3 11.4 11.5 11.6 Introduction The Within Block Model The Between Block Model Combining Within Block and Between Block Information Common Slope Model Adjusted Means and Comparing Treatments 11.6.1 Common Slope Model 11.6.2 Non-Parallel Lines Model 11.7 Example: Two Treatments 11.8 Example: Four Treatments in RCB 11.9 Example: Four Treatments in BIB 11.10 Summary References Exercises Chapter 12 Random Effects Models with Covariates 12.1 12.2 12.3 12.4 Introduction The Model Estimation of the Variance Components Changing Location of the Covariate Changes the Estimates of the Variance Components 12.5 Example: Balanced One-Way Treatment Structure 12.6 Example: Unbalanced One-Way Treatment Structure 12.7 Example: Two-Way Treatment Structure 12.8 Summary References Exercises © 2002 by CRC Press LLC C0317fm frame Page 9 Monday, June 25, 2001 1:04 PM Chapter 13 Mixed Models 13.1 13.2 13.3 13.4 Introduction The Matrix Form of the Mixed Model Fixed Effects Treatment Structure Estimation of Fixed Effects and Some Small Sample Size Approximations 13.5 Fixed Treatments and Locations Random 13.6 Example: Two-Way Mixed Effects Treatment Structure in a CRD 13.7 Example: Treatments are Fixed and Locations are Random with a RCB at Each Location References Exercises Chapter 14 Analysis of Covariance Models with Heterogeneous Errors 14.1 14.2 14.3 Introduction The Unequal Variance Model Tests for Homogeneity of Variances 14.3.1 Levene’s Test for Equal Variances 14.3.2 Hartley’s F-Max Test for Equal Variances 14.3.3 Bartlett’s Test for Equal Variances 14.3.4 Likelihood Ratio Test for Equal Variances 14.4 Estimating the Parameters of the Regression Model 14.4.1 Least Squares Estimation 14.4.2 Maximum Likelihood Methods 14.5 Determining the Form of the Model 14.6 Comparing the Models 14.6.1 Comparing the Nonparallel Lines Models 14.6.2 Comparing the Parallel Lines Models 14.7 Computational Issues 14.8 Example: One-Way Treatment Structure with Unequal Variances 14.9 Example: Two-Way Treatment Structure with Unequal Variances 14.10 Example: Treatments in Multi-location Trial 14.11 Summary References Exercises Chapter 15 Analysis of Covariance for Split-Plot and Strip-Plot Design Structures 15.1 15.2 15.3 15.4 Introduction Some Concepts Covariate Measured on the Whole Plot or Large Size of Experimental Unit Covariate is Measured on the Small Size of Experimental Unit © 2002 by CRC Press LLC C0317fm frame Page 10 Monday, June 25, 2001 1:04 PM 15.5 Covariate is Measured on the Large Size of Experimental Unit and a Covariate is Measured on the Small Size of Experimental Unit 15.6 General Representation of the Covariate Part of the Model 15.6.1 Covariate Measured on Large Size of Experimental Unit 15.6.2 Covariate Measured on the Small Size of Experimental Units 15.6.3 Summary of General Representation 15.7 Example: Flour Milling Experiment — Covariate Measured on the Whole Plot 15.8 Example: Cookie Baking 15.9 Example: Teaching Methods with One Covariate Measured on the Large Size Experimental Unit and One Covariate Measured on the Small Size Experimental Unit 15.10 Example: Comfort Study in a Strip-Plot Design with Three Sizes of Experimental Units and Three Covariates 15.11 Conclusions References Exercises Chapter 16 Analysis of Covariance for Repeated Measures Designs 16.1 16.2 16.3 16.4 Introduction The Covariance Part of the Model — Selecting R Covariance Structure of the Data Specifying the Random and Repeated Statements for PROC MIXED of the SAS® System 16.5 Selecting an Adequate Covariance Structure 16.6 Example: Systolic Blood Pressure Study with Covariate Measured on the Large Size Experimental Unit 16.7 Example: Oxide Layer Development Experiment with Three Sizes of Experimental Units Where the Repeated Measure is at the Middle Size of Experimental Unit and the Covariate is Measured on the Small Size Experimental Unit 16.8 Conclusions References Exercises Chapter 17 Analysis of Covariance for Nonreplicated Experiments 17.1 17.2 17.3. 17.4 17.5 17.6 17.7 17.8 Introduction Experiments with A Single Covariate Experiments with Multiple Covariates Selecting Non-null and Null Partitions Estimating the Parameters Example: Milling Flour Using Three Factors Each at Two Levels Example: Baking Bread Using Four Factors Each at Two Levels Example: Hamburger Patties with Four Factors Each at Two Levels © 2002 by CRC Press LLC C0317fm frame Page 11 Monday, June 25, 2001 1:04 PM 17.9 Example: Strength of Composite Material Coupons with Two Covariates 17.10 Example: Effectiveness of Paint on Bricks with Unequal Slopes 17.11 Summary References Exercises Chapter 18 Special Applications of Analysis of Covariance 18.1 18.2 18.3 18.4 Introduction Blocking and Analysis of Covariance Treatments Have Different Ranges of the Covariate Nonparametric Analysis of Covariance 18.4.1 Heart Rate Data from Exercise Programs 18.4.2 Average Daily Gain Data from a Two-Way Treatment Structure 18.5 Crossover Design with Covariates 18.6 Nonlinear Analysis of Covariance 18.7 Effect of Outliers References Exercises © 2002 by CRC Press LLC C0317fm frame Page 13 Monday, June 25, 2001 1:04 PM Preface Analysis of covariance is a statistical procedure that enables one to incorporate information about concomitant variables into the analysis of a response variable. Sometimes this is done in an attempt to reduce experimental error. Other times it is done to better understand the phenomenon being studied. The approach used in this book is that the analysis of covariance model is described as a method of comparing a series of regression models — one for each of the levels of a factor or combinations of levels of factors being studied. Since covariance models are regression models, analysts can use all of the methods of regression analysis to deal with problems such as lack of fit, outliers, etc. The strategies described in this book will enable the reader to appropriately formulate and analyze various kinds of covariance models. When covariates are measured and incorporated into the analysis of a response variable, the main objective of analysis of covariance is to compare treatments or treatment combinations at common values of the covariates. This is particularly true when the experimental units assigned to each of the treatment combinations may have differing values of the covariates. Comparing treatments is dependent on the form of the covariance model and thus care must be taken so that mistakes are not made when drawing conclusions. The goal of this book is to present the structure and philosophy for using the analysis of covariance by including descriptions of methodologies, illustrating the methodologies by analyzing numerous data sets, and occasionally furnishing some theory when required. Our aim is to provide data analysts with tools for analyzing data with covariates and to enable them to appropriately interpret the results. Some of the methods and techniques described in this book are not available in other books, but two issues of Biometrics (1957, Volume 13, Number 3, and 1982, Volume 38, Number 3) were dedicated to the topic of analysis of covariance. The topics presented are among those that we, as consulting statisticians, have found to be most helpful in analyzing data when covariates are available for possible inclusion in the analysis. Readers of this book will learn how to: • Formulate appropriate analysis of covariance models • Simplify analysis of covariance models • Compare levels of a factor or of levels of combinations of factors when the model involves covariates • Construct and analyze a model with two or more factors in the treatment structure • Analyze two-way treatment structures with missing cells • Compare models using the beta-hat model • Perform variable selection within the analysis of covariance model © 2002 by CRC Press LLC C0317fm frame Page 14 Monday, June 25, 2001 1:04 PM • Analyze models with blocking in the design structure and use combined intra-block and inter-block information about the slopes of the regression models • Use random statements in PROC MIXED to specify random coefficient regression models • Carry out the analysis of covariance in a mixed model framework • Incorporate unequal treatment variances into the analysis • Specify the analysis of covariance models for split-plot, strip-plot and repeated measures designs both in terms of the regression models and the covariance structures of the repeated measures • Incorporate covariates into the analysis of nonreplicated experiments, thus extending some of the results in Analysis of Messy Data, Volume II The last chapter consists of a collection of examples that deal with (1) using the covariate to form blocks, (2) crossover designs, (3) nonparametric analysis of covariance, (4) using a nonlinear model for the covariate model, and (5) the process of examining mixed analysis of covariance models for possible outliers. The approach used in this book is similar to that used in the first two volumes. Each topic is covered from a practical viewpoint, emphasizing the implementation of the methods much more than the theory behind the methods. Some theory has been presented for some of the newer methodologies. The book utilized the procedures of the SAS® system and JMP® software packages to carry out the computations and few computing formulae are presented. Either SAS® system code or JMP® menus are presented for the analysis of the data sets in the examples. The data in the examples (except for those using chocolate chips) were generated to simulate real world applications that we have encountered in our consulting experiences. This book is intended for everyone who analyzes data. The reader should have a knowledge of analysis of variance and regression analysis as well as basic statistical ideas including randomization, confidence intervals, and hypothesis testing. The first four chapters contain the information needed to form a basic philosophy for using the analysis of covariance with a one-way treatment structure and should be read by everyone. As one progresses through the book, the topics become more complex by going from designs with blocking to split-plot and repeated measures designs. Before reading about a particular topic in the later chapters, read the first four chapters. Knowledge of Chapters 13 and 14 from Analysis of Messy Data, Volume I: Designed Experiments would be useful for understanding the part of Chapter 5 involving missing cells. The information in Chapters 4 through 9 of Analysis of Messy Data, Volume II: Nonreplicated Experiments is useful for comprehending the topics discussed in Chapter 17. This book is the culmination of more than 25 years of writing. The earlier editions of this manuscript were slanted toward providing an appropriate analysis of split-plot type designs by using fixed effects software such as PROC GLM of the SAS® system. With the development of mixed models software, such as PROC MIXED of the SAS® system and JMP®, the complications of the analysis of splitplot type designs disappeared and thus enabled the manuscript to be completed without including the difficult computations that are required when using fixed © 2002 by CRC Press LLC C0317fm frame Page 15 Monday, June 25, 2001 1:04 PM effects software. Over the years, several colleagues made important contributions. Discussions with Shie-Shien Yang were invaluable for the development of the variable selection process described in Chapter 7. Vicki Landcaster and Marie Loughin read some of the earlier versions and provided important feedback. Discussions with James Schwenke, Kate Ash, Brian Fergen, Kevin Chartier, Veronica Taylor, and Mike Butine were important for improving the chapters involving combining intraand inter-block information and the strategy for the analysis of repeated measures designs. Finally, we cannot express enough our thanks to Jane Cox who typed many of the initial versions of the chapters. If it were not for Jane’s skills with the word processor, the task of finishing this book would have been much more difficult. We dedicate this volume to all who have made important contributions to our personal and professional lives. This includes our wives, Janet and Erma Jean, our children, Scott and April and Kelly and Mark, and our parents and parents in-law who made it possible for us to pursue our careers as statisticians. We were both fortunate to study with Franklin Graybill and we thank him for making sure that we were headed in the right direction when our careers began. © 2002 by CRC Press LLC C0317c01 frame Page 1 Sunday, June 24, 2001 1:46 PM 1 Introduction to the Analysis of Covariance 1.1 INTRODUCTION The statistical procedure termed analysis of covariance has been used in several contexts. The most common description of analysis of covariance is to adjust the analysis for variables that could not be controlled by the experimenter. For example, if a researcher wishes to compare the effect that ten different chemical weed control treatments have on yield of a specific wheat variety, the researcher may wish to control for the differential effects of a fertility trend occurring in the field and for the number of wheat plants per plot that happen to emerge after planting. The differential effects of a fertility trend can possibly be removed by using a randomized complete block design structure, but it may not be possible to control the number of wheat plants per plot (unless the seeds are sewn thickly and then the emerging plants are thinned to a given number of plants per plot). The researcher wishes to compare the treatments as if each treatment were grown on plots with the same average fertility level and as if every plot had the same number of wheat plants. The use of a randomized complete block design structure in which the blocks are constructed such that the fertility levels of plots within a block are very similar will enable the treatments to be compared by averaging over the fertility levels, but the analysis of covariance is a procedure which can compare treatment means after first adjusting for the differential number of wheat plants per plot. The adjustment procedure involves constructing a model that describes the relationship between yield and the number of wheat plants per plot for each treatment, which is in the form of a regression model. The regression models, one for each level of the treatment, are then compared at a predetermined common number of wheat plants per plot. 1.2 THE COVARIATE ADJUSTMENT PROCESS To demonstrate the type of adjustment process that is being carried out when the analysis of covariance methodology is applied, the set of data in Table 1.1 is used in which there are two treatments and five plots per treatment in a completely randomized design structure. Treatment 1 is a chemical application to control the growth of weeds and Treatment 2 is a control without any chemicals to control the weeds. The data in Table 1.1 consist of the yield of wheat plants of a specific variety from plots of identical size along with the number of wheat plants that emerged © 2002 by CRC Press LLC C0317c01 frame Page 2 Sunday, June 24, 2001 1:46 PM 2 Analysis of Messy Data, Volume III: Analysis of Covariance TABLE 1.1 Yield and Plants per Plot Data for the Example in Section 1.2 Treatment 1 Yield per plot 951 957 776 1033 840 Treatment 2 Plants per plot 126 128 107 142 120 Yield per plot 930 790 764 989 740 Plants per plot 135 119 110 140 102 Yield per plot 1100 1000 900 X Means X 800 700 1 2 Treatment Number FIGURE 1.1 Plot of the data for the two treatments, with the “X” denoting the respective means. after planting per plot. The researcher wants to compare the yields of the two treatments for the condition when there are 125 plants per plot. Figure 1.1 is a graphical display of the plot yields for each of the treatments where the circles represent the data points for Treatment 1 and the boxes represent the data points for Treatment 2. An “X” is used to mark the means of each of the treatments. If the researcher uses the two-sample t-test or one-way analysis of variance to compare the two treatments without taking information into account about the number of plants per plot, a t statistic of 1.02 or a F statistic of 1.05 is obtained, indicating the two treatment means are not significantly different ( p = 0.3361). The results of the analysis are in Table 1.2 in which the estimated standard error of the difference of the two treatment means is 67.23. © 2002 by CRC Press LLC C0317c01 frame Page 3 Sunday, June 24, 2001 1:46 PM Introduction to the Analysis of Covariance 3 TABLE 1.2 Analysis of Variance Table and Means for Comparing the Yields of the Two Treatments Where No Information about the Number of Plants per Plot is Used Source Model Error Corrected total df 1 8 9 SS 11833.60 90408.40 102242.00 MS 11833.60 11301.05 FValue 1.05 ProbF 0.3361 Source TRT df 1 SS (type III) 11833.60 MS 11833.60 FValue 1.05 ProbF 0.3361 Parameter Trt 1 – Trt 2 Estimate 68.8 StdErr 67.23 t Value 1.02 Probt 0.3361 TRT 1 2 LSMean 911.40 842.60 ProbtDiff 0.3361 1100 el od m t1 en atm Tre Yield per plot 1000 l de 2 nt mo e atm Tre 900 800 Treatment 1 data Treatment 2 data 700 100 110 120 130 140 150 Number of plants per plot FIGURE 1.2 Plot of the data and the estimated regression models for the two treatments. The next step is to investigate the relationship between the yield per plot and the number of plants per plot. Figure 1.2 is a display of the data where the number of plants is on the horizontal axis and the yield is on the vertical axis. The circles denote the data for Treatment 1 and the boxes denote the data for Treatment 2. The two lines on the graph, denoted by Treatment 1 model and Treatment 2 model, were computed from the data by fitting the model yij = αi + βxij + εij, i = 1, 2 and j = 1, © 2002 by CRC Press LLC C0317c01 frame Page 4 Sunday, June 24, 2001 1:46 PM 4 Analysis of Messy Data, Volume III: Analysis of Covariance TABLE 1.3 Analysis of Covariance to Provide the Estimates of the Slope and Intercepts to be Used in Adjusting the Data Source Model Error Uncorr Total df 3 7 10 SS 7787794.74 5737.26 7793532.00 MS 2595931.58 819.61 FValue 3167.28 ProbF 0.0000 Source TRT Plants df 2 1 SS(Type III) 4964.18 84671.14 MS 2482.09 84671.14 FValue 3.03 103.31 ProbF 0.1128 0.0000 Parameter Trt 1 – Trt 2 Estimate 44.73 StdErr 18.26 tValue 2.45 Probt 0.0441 Parameter TRT 1 TRT 2 Plants Estimate 29.453 –15.281 7.078 StdErr 87.711 85.369 0.696 tValue 0.34 –0.18 10.16 Probt 0.7469 0.8630 0.0000 2, …, 5, a model with different intercepts and common or equal slopes. The results are included in Table 1.3. Now analysis of covariance is used to compare the two treatments when there are 125 plants per plot. The process of the analysis of covariance is to slide or move the observations from a given treatment along the estimated regression model (parallel to the model) to intersect the vertical line at 125 plants per plot. This sliding is demonstrated in Figure 1.3 where the solid circles represent the adjusted data for Treatment 1 and the solid boxes represent the adjusted data for Treatment 2. The lines join the open circles to the solid circles and join the open boxes to the solid boxes. The lines indicate that the respective data points slid to the vertical line at which there are 125 plants per plot. The adjusted data are computed by ( ) ( ) ( yAij = yij − αˆ i + βˆ xij + αˆ i + βˆ125 = yij + βˆ 125 − xij ) The terms yij – (α̂i + βˆ xij) i = 1,2 and j = 1,2,…,5 are the residuals or deviations of the observations from the estimated regression models. The preliminary computations of the adjusted yields are in Table 1.4. These adjusted yields are the predicted yields of the plots as if each plot had 125 plants. The next step is to compare the two treatments through the adjusted yield values by computing a two-sample t statistic or the F statistic from a one-way analysis of variance. The results of these analyses are in Table 1.5. A problem with this analysis is that it assumes the adjusted data are not adjusted data and so there is no reduction in the degrees of freedom for error due to estimating the slope of the regression lines. Hence the final step is to recalculate the statistics © 2002 by CRC Press LLC C0317c01 frame Page 5 Sunday, June 24, 2001 1:46 PM Introduction to the Analysis of Covariance 5 125 plants per plot 1100 el od m t1 en Yield per plot tm a Tre 1000 el od m t2 Slide observations parallel to regression line to meet the line of 125 plants per plot en tm a Tre 900 Adjusted data symbols 800 Treatment 1 data Treatment 2 data 700 100 110 120 130 140 150 Number of plants per plot FIGURE 1.3 Plot of the data and estimated regression models showing how to compute adjusted yield values at 125 plants per plot. TABLE 1.4 Preliminary Computations Used in Computing Adjusted Data for Each Treatment as If All Plots Had 125 Plants per Plot Treatment 1 1 1 1 1 Yield Per Plot 951 957 776 1033 840 Plants Per Plot 126 128 107 142 120 Residual 29.6905 21.534 –10.8232 –1.5611 –38.8402 Adjusted Yield 943.922 935.765 903.408 912.67 875.391 2 2 2 2 2 930 790 764 989 740 135 119 110 140 102 –10.2795 –37.0279 0.6761 13.3294 33.3019 859.218 832.469 870.173 882.827 902.799 by changing the degrees of freedom for error in Table 1.5 from 8 to 7 (the cost of estimating the slope). The sum of squares error is identical for both Tables 1.3 and 1.5, but the error sum of squares from Table 1.5 is based on 8 degrees of freedom instead of 7. To account for this change in degrees of freedom in Table 1.5, the estimated standard error for comparing the two treatments needs to be multiplied by 8 ⁄ 7 , the t statistic needs to be multiplied by 7 ⁄ 8 , and the F statistic needs to be multiplied by 7/8. The recalculated statistics are presented in Table 1.6. Here the estimated standard error of the difference between the two means is 18.11, a 3.7-fold reduction over the analysis that ignores the information from the covariate. Thus, © 2002 by CRC Press LLC C0317c01 frame Page 6 Sunday, June 24, 2001 1:46 PM 6 Analysis of Messy Data, Volume III: Analysis of Covariance TABLE 1.5 Analysis of the Adjusted Yields (Too Many Degrees of Freedom for Error) Source Model Error Corrected Total df 1 8 9 SS 5002.83 5737.26 10740.09 MS 5002.83 717.16 FValue 6.98 ProbF 0.0297 Source TRT df 1 SS (Type III) 5002.83 MS 5002.83 FValue 6.98 ProbF 0.0297 Parameter Trt 1 – Trt 2 Estimate 44.734 StdErr 16.937 t Value 2.641197 Probt 0.0297 TRT 1 2 LSMean 914.231 869.497 ProbtDiff 0.0297 TABLE 1.6 Recalculated Statistics to Reflect the Loss of Error Degrees of Freedom Due to Estimating the Slope before Computing the Adjusted Yields Recalculated Recalculated Recalculated Recalculated estimated standard error t-statistic F-statistic significance level 18.11 2.47 6.10 0.0428 by taking into account the linear relationship between the yield of the plot and the number of plants in that plot, there is a tremendous reduction in the variability of the data. In fact, the analysis of the adjusted data shows there is a significant difference between the yields of the two treatments when adjusting for the unequal number of plants per plot (p = 0.0428), when the analysis of variance in Table 1.2 did not indicate there is a significant difference between the treatments ( p = 0.3361). The final issue is that since this analysis of the adjusted data overlooks the fact the slope has been estimated, the estimated standard error of the difference of two means is a little small as compared to the estimated standard error one gets from the analysis of covariance. The estimated standard error of the difference of the two means as computed from the analysis of covariance in Table 1.3 is 18.26 as compared to 18.11 for the analysis of the adjusted data. Thus the two analyses are not quite identical. This example shows the power of being able to use information about covariates or independent variables to make decisions about the treatments being included in © 2002 by CRC Press LLC C0317c01 frame Page 7 Sunday, June 24, 2001 1:46 PM Introduction to the Analysis of Covariance 7 the study. The analysis of covariance uses a model to adjust the data as if all the observations are from experimental units with identical values of the covariates. A typical discussion of analysis of covariance indicates that the analyst should include the number of plants as a term in the model so that term accounts for variability in the observed yields, i.e., the variance of the model is reduced. If including the number of plants in the model reduces the variability enough, then it is used to adjust the data before the variety means are compared. It is important to remember that there is a model being assumed when the covariate or covariates are included in a model. 1.3 A GENERAL AOC MODEL AND THE BASIC PHILOSOPHY In this text, the analysis of covariance is described in more generality than that of adjusting for variation due to uncontrollable variables. The analysis of covariance is defined as a method for comparing several regression surfaces or lines, one for each treatment or treatment combination, where a different regression surface is possibly used to describe the data for each treatment or treatment combination. A one-way treatment structure with t treatments in a completely randomized design structure (Milliken and Johnson, 1992) is used as a basis for setting up the definitions for the analysis of covariance model. The experimental situation involves selecting N experimental units from a population of experimental units and measuring k characteristics x1ij, x2ij, …, xkij on each experimental unit. The variables x1ij, x2ij, …, xkij are called covariates or independent variables or concomitant variables. It is important to measure the values of the covariates before the treatments are applied to the experimental units so that the levels of the treatments do not effect the values of the covariates. At a minimum, the values of the covariate should not be effected by the applied levels of the treatments. In the chemical weed treatment experiment, the number of plants per plot occur after applying a particular treatment on a plot, so the value of the covariate (number of plants per plot) could not be determined before the treatments were applied to the plots. If the germination rate is affected by the applied treatments, then the number of plants per plot cannot be used as a covariate in the conventional manner (see Chapter 2 for further discussion). After the set of experimental units is selected and the values of the covariates are determined (whent possible), then randomly assign ni experimental units to treatment i, where N = Σ ni. One generally assigns equal numbers of experimental units to the levels i1 of the treatment, but equal numbers of experimental units per level of the treatment are not necessary. After an experimental unit is subjected its specific level of the treatment, then measure the response or dependent variable which is denoted by yij. Thus the variables used in the discussions are summarized as: yij x1ij x2ij xkij is is is is the the the the dependent measure first independent variable or covariate second independent variable or covariate kth independent variable or covariate © 2002 by CRC Press LLC C0317c01 frame Page 8 Sunday, June 24, 2001 1:46 PM 8 Analysis of Messy Data, Volume III: Analysis of Covariance At this point, the experimental design is a one-way treatment structure with t treatments in a completely randomized design structure with k covariates. If there is a linear relationship between the mean of y for the ith treatment and the k covariates or independent variables, an analysis of covariance model can be expressed as: y ij = βoi + βli x lij + β2 i x 2 ij + … + β ki x kij + ε ij (1.1) for i = 1, 2, …, t, and j = 1, 2, …, ni, and the εij ~ iid N(0, σ2), i.e., the εij are independently identically distributed normal random variables with mean 0 and variance σ2. The important thing to note about this model is that the mean of the y values from a given treatment depends on the values of the x’s as well as on the treatment applied to the experimental units. The analysis of covariance is a strategy for making decisions about the form of the covariance model through testing a series of hypotheses and then making treatment comparisons by comparing the estimated responses from the final regression models. Two important hypotheses that help simplify the regression models are H01: βh1 = βh2 = … = βht = 0 vs. Ha1: (not H01:), that is, all the treatments’ slopes for the hth covariate are zero, h = 1, 2, …, k, or H02: βh1 = βh2 = … = βht vs. Ha2: (not Ho2:), that is, the slopes for the hth covariate are equal across the treatments, meaning the surfaces are parallel in the direction of the hth covariate, h = 1, 2, …, k. The analysis of covariance model in Equation 1.1 is a combination of an analysis of variance model and a regression model. The analysis of covariance model is part of an analysis of variance model since the intercepts and slopes are functions of the levels of the treatments. The analysis of covariance model is also part of a regression model since the model for each treatment is a regression model. An experiment is designed to purchase a certain number of degrees of freedom for error (generally without the covariates) and the experimenter is willing to sell some of those degrees of freedom for good or effective covariates which will help reduce the magnitude of the error variance. The philosophy in this book is to select the simplest possible expression for the covariate part of the model before making treatment comparisons. This process of model building to determine the simplest adequate form of the regression models follows the principle of parsimony and helps guard against foolishly selling degrees of freedom for error to retain unnecessary covariate terms in the model. Thus the strategy for analysis of covariance begins with testing hypotheses such as H01 and H02 to make decisions about the form of the covariate or regression part of the model. Once the form of the covariate part of the model is finalized, the treatments are compared by comparing the regression surfaces at predetermined values of the covariates. The structure of the following chapters leads one through the forest of analysis of covariance by starting with the simple model with one covariate and building through the complex process involving analysis of covariance in split-plot and © 2002 by CRC Press LLC
© Copyright 2026 Paperzz