Wayne State University DigitalCommons@WayneState Wayne State University Dissertations 1-1-2012 The effect of nonrandom selection of clusters in a two stage cluster design Jason Christopher Parrott Wayne State University, Follow this and additional works at: http://digitalcommons.wayne.edu/oa_dissertations Recommended Citation Parrott, Jason Christopher, "The effect of nonrandom selection of clusters in a two stage cluster design" (2012). Wayne State University Dissertations. Paper 579. This Open Access Dissertation is brought to you for free and open access by DigitalCommons@WayneState. It has been accepted for inclusion in Wayne State University Dissertations by an authorized administrator of DigitalCommons@WayneState. THE EFFECT OF NONRANDOM SELECTION OF CLUSTERS IN A TWO STAGE CLUSTER DESIGN by JASON PARROTT DISSERTATION Submitted to the Graduate School of Wayne State University, Detroit, Michigan in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY 2012 MAJOR: EVALUATION & RESEARCH Approved by: ______________________________ Advisor Date ______________________________ ______________________________ ______________________________ ______________________________ © COPYRIGHT BY JASON PARROTT 2012 All Rights Reserved ACKNOWLEDGEMENTS I would like to thank my advisor, Dr. Shlomo Sawilowsky whose support and guidance helped make this dissertation possible. His willingness to meet and discuss the foundations of my dissertation is greatly appreciated. I would also like to thank my committee: Dr. Gail Fahoome, Dr. Barry Markman, and Dr. Hermina Anghelescu who supported me in the completion of my dissertation. I appreciate the flexibility and guidance that my committee provided. ii TABLE OF CONTENTS Acknowledgements…………………....………………………………………………………..ii List of Tables………………………..………..…..…………………………...........................iv List of Figures………..…………………………………………….…………………………....v Chapter One: Introduction………………...………………..……………………………….....1 Chapter Two Literature Review………………………………………………………………..6 Chapter Three Methodology….…….……………………………………………………......32 Chapter Four Results………………………………………………………………………….35 Chapter Five Conclusions and Recommendations….………………………………..…...43 Appendix A Formulas………………………….……………………………………………...48 Appendix B Rho Chart………………………….………………………………………….….58 Bibliography……………………………………………………………………………….…...62 Abstract…………………………………………………………………………………………70 Autobiographical Statement…………………………………………………………………..71 iii LIST OF TABLES Table 4.1: Rho Table……………………………………………..………………………...…36 Table 4.2: Random vs. Purposeful Lower Limit Cluster Results……………...……….....38 Table 4.3: Random vs. Purposeful Upper Limit Cluster Results………………………….40 Table 4.4: Proportion of purposeful cluster over random cluster…………………….41-42 iv LIST OF FIGURES Figure 4.1: Random vs. Purposeful Lower Limit Cluster Results…………………….......37 Figure 4.2: Random vs. Purposeful Upper Limit Cluster Results...................................39 Figure 4.3: Proportion of purposeful cluster over random cluster………………………..41 v 1 CHAPTER I INTRODUCTION It would be ideal to test every subject within the target population when conducting experiments. This type of research can often prove to be too cumbersome, unmanageable, and unrealistic. A more reasonable, although less accurate, approach is to take a sample from the desired population and conduct an experiment on the sample. Inferences can then be made regarding the characteristics of the population. Sampling reduces costs and improves speed in conducting experiments (Cochran, 1977). There are many sampling procedures that can be used to represent populations. Each procedure is based on certain principles and assumptions. One of the principles that sampling must adhere to is randomization, which is the process that gives each subject in the population a non-zero chance of being selected (Weisberg, Krosnick, & Bowen, 1996). In the context of experimental design, randomization is an important step in ensuring that the variability in participants is equally distributed into treatment groups thus eliminating the possibility that any portion of the population will be overrepresented (Weisberg, Krosnick, & Bowen, 1996). Randomization is essential in both the selection and assignment processes. The selection of participants must occur before assignment can occur (Runyon, Coleman & Pittenger, 2000). Selection is the process by which participants for a study are chosen (Campbell & Stanley, 1963). The treatment group receives the intervention that the researcher is attempting to study while the control group receives no treatment and is held constant. Random selection is essential for generalization of the study. When random selection is done correctly, the 2 results of the study can be applied to the population as a whole within the significance level set by the researcher. Assignment is the process of deciding how the treatment will be distributed (Runyon, Coleman & Pittenger, 2000). It is possible to either assign treatments to the groups or to assign the groups to the treatment. When random assignment is violated, the study loses internal validity, meaning that the researcher cannot be certain the results obtained were due to the intervention or to an undefined extraneous variable. Ceteras paribus, simple random sampling provides for the most accurate inferences of sampling procedures, and is often the easiest to conduct. However, it may not be practical in some research contexts. For example, a study involving the sampling from the population of the United States can be more efficiently conducted defining a sampling frame of groups or clusters, and then selecting participants from those clusters. The problem with this method is that the savings in time and cost generally comes at the loss of statistical efficiency. The dilemma that the researcher is left to solve is when to use simple random sampling as opposed to cluster sampling. Kish (1965), suggested that when the lower cost per element of sampling outweighs the increase in variance and the problems associated with statistical analysis that cluster sampling is the more realistic choice. This scenario often occurs in large widespread samples. Once it has been determined that the researcher is going to use a cluster sample, it is important that the clusters are well defined (Sudman, 1976). A desirable cluster is determined by the researcher’s objectives (Kish, 1965). The researcher’s participants can either be organized or naturally placed into clusters that share common characteristics. Clusters can be 3 formed for many different purposes, including medical practices, school classroom, voting districts, counties, states, etc. Once the participants are assigned, and divided into clusters, the researcher can then test the entire cluster or can draw another sample from the cluster. In the latter method, individual participants are randomly selected to represent the entire cluster. These participants are tested and then the results are analyzed according to the study parameters. This form of cluster sample is called a two stage cluster sample. Cluster sampling must adhere to the same rules as individual sampling with respect to randomization at all levels of the study. This process can be violated when the researcher wants to ensure certain participants are included in the study. For example, Michigan has 83 counties that represent the state. Each of the counties could serve a defined cluster when examining the state and if each were selected at random this would be a valid experiment. However, the majority 4,052,201 (40%) of the population in the state is located in the tri-county area of Wayne, Macomb, and Oakland counties (Michigan Governmental Website. (July, 21, 2010). Retrieved July 21, 2010 from http://www.michigan.gov/cgi/0,1607,7-158-54534-240589--,00.html). Because a considerable amount of the population, wealth, and state interests are located within these three Michigan counties, they are frequently automatically included in many studies and the participants from the remaining counties are then randomly selected to complete the sample. Due to a variety of possible reasons (i.e lack of understanding, political pressure, etc.), it appears that some researchers believe that a study that did not include these three counties would be discounted or considered irrelevant. 4 This same scenario occurs in the state of New York. During 2002-2006, there was an average population of 19,228,641. Of those, 8,177,449 (43%) lived in the five boroughs of Bronx, Brooklyn, Manhattan, Queens, and Suffolk (New York Governmental Website. Department of Labor. (July 1, 2009). Retrieved July 21, 2010 from http://www.labor.ny.gov/stats/nys/statewide_population_data.asp). It is the belief of some that a study of the State of New York that did not include these highly populated, well known counties, would not be considered as important as one that included them. Some researchers would argue that automatically including these New York counties would be considered valid as long as the individual participants are randomly selected from the clusters. However if at any point of a study, the principles of randomization are violated, that sampling frame is no longer representative of the entire state, and conclusions regarding that population are at least to some nonarbitrary degree, invalid. Statement of the Problem. The question that this study will answer is: To what extent is the validity of purposeful cluster samples studies compromised? More specifically, it is the purpose of this study to demonstrate how the principles of randomization are violated in two-stage cluster sampling and how much this violation affects the results of the studies. The study will examine the impact of cluster sampling when randomization of selection of clusters has been violated. More specifically, it is concerned with Type I error properties (false positives) that may occur after failure to randomly select clusters in the first stage of a two stage cluster design. 5 Limitations: In order to conduct the simulation, there needs to be a distribution from which to draw data. In this study, a normal distribution will be used. A normal curve is not necessarily the best representation of real data sets. However, it will permit a close view of the best case scenario. Hence, if the results are unsatisfactory for normally distributed data, perforce the substitution of real data will yield ever less satisfactory results. 6 CHAPTER 2 LITERATURE REVIEW When conducting experiments, the researcher is often manipulating an independent variable with the dependent variable in order to see if it produces a significant change. In order to improve the power of the study, the researcher may match samples or use stratification to help ensure the samples have baseline equality. In the process of doing so, the sample can become more restricted to meet the parameters of the design, and as a result, the internal validity of the study may be compromised. This problem can become confounded with cluster sampling, which is a complex procedure because the researcher has to account for the variance between clusters as well as the variance between individual participants. The purpose of this review is to examine what cluster sampling is and how it differs from the sampling of individual participants. Unique differences such as unit of analysis, intercluster correlation, sample size, and effect size will be discussed in greater detail as well as equations to account for these differences. The fact that these procedures are well known and still not being incorporated will also be examined. Procedures used to increase the power of statistical tests following cluster sampling will also be discussed, and how these procedures can influence the internal validity of the results and the external validity of the study. Finally, the review will discuss a brief history of how cluster sampling evolved to its current state and how accurately it has being incorporated recently. The final discussion will relate to the purpose of this study. Uniqueness of Cluster Designs: Cluster randomized statistical designs or experiments are designs where clusters of individuals are randomly assigned to treatment and control groups rather than the individuals themselves (Donner, 1998). A two stage cluster sample is a 7 cluster sample where clusters are either selected randomly or by design and from those cluster individuals are then selected to represent the clusters. According to Donner and Klar (2000), there are three commonly used designs when setting up cluster randomized trials. They are as follows: completely random, stratified, and matched pair . The completely randomized cluster design assigns intact clusters without consideration to other factors (Donner, 1998). For example, if researchers in a state were examining whether to enact a reading program for its students and to test the effects of the program, they could use a completely randomized design. To accomplish this, they would randomly select districts from each state and assign them to an experimental group and to a control group. If each district in the state had an equal chance of being selected without any predisposed criteria it would be a completely randomized design. The completely randomized design is the most simplistic yet most powerful design to set up. This design is most appropriate for a large number of clusters (Donner & Klar 2000). It is also easy and inexpensive, for sample selection, data analysis, and sampling variance (Sudman, 1976). The largest drawback to this design is that although it is theoretically and mathematically the most efficient, it is often logistically difficult to do. It can be a long and tedious process to construct this design if the sample size or population are large (Sudman, 1976). An example of this form of design is the ACEH trial conducted by Abdeljaber, (1991). In this study, 229 villages from a sample of 450 were selected and given vitamin A supplements. They were then compared against a control group to determine the effectiveness of the supplements on various health factors. Because selection was taken at random with no other predisposed criteria, it is considered a randomized design. This study differs from a 8 stratification or matching study because the sample was not further categorized from the original selection. A stratified cluster design is a more stringent cluster design that assigns two or more clusters to some combination of some statistical subpopulation or intervention (Donner & Klar, 2000). This subpopulation can be any population that seems to be a relevant contributor to the study. For instance, some cluster designs are stratified according to factors such as cluster size, geographic area, or socioeconomic status (Donner, 1998). Using the previously discussed example on reading programs, if the districts were divided according to size before they were randomly selected and assigned this would be a form of stratification. Stratification designs are most effective in small studies (Donner and Klar, 2000). When used correctly, stratification nearly always results in smaller variance for the estimated mean or total than is given by the simple random sample (Cochran 1977). This reduction in variance can make the sample more efficient (Sudman, 1976). There are four primary reasons to stratify. They are as followed: 1. 2. 3. 4. Strata themselves are of primary interest Variances differ between strata Costs differ between strata Prior information differs by strata (Sudman,1976). There are no disadvantages to using stratification, however, unless the units of stratification are exceedingly large and only small amounts of variation remain within the strata, the gains will only be moderate at best (Hansen, Huirwitz & Madow, 1953). Although there are no disadvantages, there are certain situations where stratification should not be used as a means to strengthen a design. Stratification should not be used for the following reasons: 9 1. To ensure randomness 2. With non probability samples 3. Adjusting for non cooperation (Sudman, 1976) An example of a stratification design is the Child and Adolescent Trial For Cardiovascular Health or CATCH trial (Perry, et. al, 1997). In this trial, the objective was to compare the change in serum cholesterol levels from the baseline to the end of the intervention period. Students were assigned to either a school-based intervention, school based and family based intervention, or the standard curriculum. The school-based intervention was CATCH curricula, means to provide for more nutritional meals, and health based classroom activities. In addition to this, the group who received the additional family based intervention also received at home activities. The stratifying factor was geographic areas or cities with each city contributing 24 schools to both the control and experimental groupings (Donner & Klar, 2000). In this design, the researchers had the flexibility to determine by which factors to stratify. This flexibility does not exist in the matched pair design. The matched pair design is a most stringent design of the three. In a matched design, two clusters in a stratum are randomly assigned to an experimental group and a control group (Donner, 1998), With a matched pair design, the presumed extraneous variables form a tight match reducing imbalance of baseline risk factors. The strength of the matched pair design is it can lead to an increase in power for the study. The power of the matched design will continue to improve as the effectiveness of the match increases (Donner and Klar, 2000). However, the matched pair design has many disadvantages. It is difficult to determine the inter-cluster correlation between match pairs. Also, if the matching variables are not related to the outcome there may actually be a loss in power due to loss of degrees of freedom (Klar & Donner, 2000). Finally, there is also the concern that if participants are being matched according to one 10 variable they may differ widely in many other unknown variables. In general, matching should be avoided in studies with a small sample. They typically work better for designs that have larger than 10 pairs (Donner & Klar, 2000). In the hypothetical example used earlier, two districts with similar previously agreed upon criteria would be matched then randomly assigned to a control and experimental group as in the following example. Royce et al. (1993) examined the smoking cessation attempts between African Americans and Caucasian Americans. They used baseline data from the COMMIT research group which was designed to match groups according to the following factors: community size, population density, demographic profile, community structure, and geographic proximity. These matching factors helped form equal groupings among these variables. However, they could exacerbate differences in other areas. In addition to the uniqueness of each form of cluster design, cluster designs as a whole differ from other designs in many other statistical factors. Cluster designs are unique from their individual design counterparts because the effects of clustering must be accounted for statistically. When participants are clustered, they may either contain similar traits or may acquire them as a result of their clustering. These similarities often cause the participants to not be statistically independent and as a result cause inflated results when rejecting the null hypothesis (Simpson et al., 1995). The similarities are unknown but usually occur for one of two reasons: either participants are grouped together because of similarities not accounted for in the design, or because they have been together for a long enough period of time that they have influenced each other through discussions or other actions. These influences have led them to make similar observations or responses. 11 Consider the following example for the former of these two situations. Schools in separate districts are chosen to partake in achievement testing. Some of the schools are giving an intervention which is supposed to raise reading achievement and the remainder serves as a control. In the selection process, many of the more affluent districts are giving the intervention and it is determined that a significant result has been returned. The result could be attributed to the students’ higher socioeconomic status or other supplemental interventions the district offered. An example of the latter of these two instances often occurs in a workplace environment. As people work to together for longer period of time, they tend to influence each other’s decision making. They also have a greater chance of being exposed to the same extraneous variables. For instance, if one member of a cluster was exposed to an infectious disease the other groups would be more likely to acquire this disease due to increased exposure regardless of interventions used (Simpson et al., 1995). This cluster effect has to be quantified to determine the validity of a cluster design. In addition to clustering effect, there are other statistical challenges compared to their non- clustered counterparts. Before explaining the intricacies of each individual design that is used in cluster randomized trials, it is important to explain the different variables or adaptations of the standard components that can become an issue in cluster randomized trials if not addressed. Among these issues that the researcher needs to consider are the unit of analysis, inter-cluster correlation, sample size, and effect sizes. Unit of Analysis: Conducting traditional experiments using individual participants is standard methodology. The researcher randomly samples from a pool of individuals and then randomly assigns those individuals to a treatment or a control group. The treatment is given to 12 the individual directly and the researcher evaluates the results of that treatment. Often in cluster designs, the cluster itself is assigned to a treatment or a control group and the individuals are then given the treatment. The entire cluster is then evaluated as to the success of the treatment. This presents a problem because it cannot be determined if the change is a result of the treatment or the effect of the clustering of the individuals. The unit of analysis differs from the unit of selection and as a result poses a statistical challenge. Cluster trials have a unit of analysis issue that needs to be addressed. This unit of analysis problem is a problem that occurs when the level of assignment to study conditions and the level of analysis of the data differ (Rooney & Murray, 1996). For example, consider participants attempting to lose weight through an exercise program. The treatment group could be placed in a class where the intervention is given and compared to that of a control group. The individuals would then be weighed prior to and at the conclusion of the class. The success of the program would be judged on the amount of weight loss by each individual participant even though the class was the unit that was used to assign participants. The effects that the clustering had on the group could be the cause of its success, not the program itself. Cornfield (1978), discussed the extent clusters can have on the outcome of a research study. He demonstrated how to account for the clustering effect involved in group trials, and to what extent sample size will need to be increased to neutralize this effect. He concluded that randomization studies by cluster with evaluation at the individual level can yield information and should not be discouraged. However, when using these studies the analysis must be appropriate and treating them as standard individual studies “is an exercise in selfdeception and should be discouraged.” (Cornfield, 1978). In order to properly use cluster designs, the researcher must account for clustering. The first steps in accounting for the 13 clustering effect are evaluations of the interclass correlation coefficient, sample size, and effect size. Each of these will be discussed in greater detail below. Intercluster correlation coefficient (ICC) In order to properly account for the clustering effect, researchers quantify it using an intraclass correlation coefficient (p) (Donner & Klar, 2000). The intraclass correlation or the intercluster correlation is defined as the standard Pearson correlation between any two subjects in the same cluster (Donner & Klar, 2000). This correlation can be quantified by using the following formula: A1.) There are many reasons for the possibility of variation between clusters including the following: a.) Individuals frequently select cluster to which they share common characteristics. For example, census data that are clustered by county would have similar people living together. A county full of residents in Wayne County, Michigan could be substantially different than a county full of residents in Oakland County, Michigan. b.) Covariates at one cluster or level affect many of the participants within that level. For example, smoking cessation participants who live in a highly industrial area may show increased lung damage compared to their counterparts. This could occur due to pollution in the area rather than effects of smoking. c.) Participants within clusters have more exposure to each other compared to other clusters and as a result influence each other. For example voters who tend to be moderate or in the center on issues may tend to be swayed by an 14 event or organization put on by a particular party. This may occur as a result of comments made by the other participants in the group (Donner, et al., 1990). Interclass correlation values are indicative to the overall success of the intervention. If these factors are too large, it is difficult to determine if the intervention was the cause of change in behavior or if an extraneous variable was. Acceptable interclass correlations can differ according to design of the experience. However, there are some guidelines that have been established. In a study conducted by Hedges and Hedberg, (2007), it was determined that the average level of an ICC value for educational based research was .22. For studies that used primarily low-socioeconomic schools, that level dropped to an average of .19 and for low achievement schools the ICC number decreased to .09. ICC for medical research can range anywhere from .2 to .6 depending on the study (Donner, et al, 1981). The acceptable Interclass correlation is dependent on the parameters of the study. An ICC value of .6 would not be acceptable in a study concerning low achieving schools. However it may be acceptable in some areas of epidemiological research. Once the correlation coefficient has been determined, the variance inflation factor can be calculated. The variance inflation factor or design effect results in a loss of statistical efficiency for the design. This loss can be quantified by the following equation: A2. D 1 (m 1) p where m is the number of individuals per cluster and p is the intra-cluster correlation value (Hayes, et al., 2000). Not only does the researcher have to adjust for the variance inflation 15 factor caused by the clustering of individuals, they must also determine the sample sizes for two different groupings. This variance between groups is a key determinant in setting up an experiment. An inflated variance causes decreased power in the study and in turn means that sample size needs to be increased (Feng, et al., 1999). Sample size. Sample size determination is a factor of three things that will typically be in conflict with one and another. They are as followed: cost, practicality, and scientific objectives (Hayes, et al., 2000). When deciding upon the appropriate sample size, there are two levels to consider. The first is the sample size of clusters chosen, and the second is the number of participants chosen per cluster (Campbell, M.J. 2000). In many cases, the researcher cannot control the number of participants per cluster since many of them are set for him/her already (Raundebush, 1997). For example, in studying classrooms within schools, the researcher is limited to the set number of students that are within the class. In determining sample size, it is important to consider that allocating large numbers of people per cluster can constrain the amount of clusters that would be allocated (Raundebush, 1997). The rationale behind this is that selecting a large number of participants per cluster causes the overall amount of individuals needed for the experiment to increase. At this point, it may be more difficult or costly to examine or recruit the necessary amount of individuals and in turn it may be more difficult to fill an adequate amount of clusters. Increasing the number of individuals per cluster will not necessarily improve efficiency. In determining the number of individuals to cluster, it is important to remember that this is a prelude to determining the number of clusters that can be constructed (Raundenbush, 1997). Inflating the number of clusters also does not ultimately cause increased statistical efficiency either. Hayes et al., (2000) detailed this phenomenon when discussing an 16 experiment regarding malaria transmission among African villages. In his study, he stated that malaria, as well as other infectious diseases are quite concentrated from one village to the next. Using villages as the cluster unit will cause the intraclass correlation to be exceptionally large and a better cluster unit may be a grouping of villages within a geographic area. The parameters of the study should dictate the appropriate balance of sample size between clusters and individuals. There is no set formula for determining the appropriate sample size for all studies. However, there are guidelines and formulas derived to assist in the planning of studies. Hsieh (1988), developed formulae regarding sample sizes in cluster trials. He allowed for sample size calculations using either the variance between squares or an estimate of the value within squares. Given this information along with the intercluster correlation, the adequate number of individuals per clusters can be determined, and using the power contours which are also provided, the number of clusters can then be calculated (Hsieh, 1988). Campbell, (2000), also developed sample size formulas. He used the following formula to determine the number of patients per practice: A3.) The formulas may be useful in determining whether to increase the number of practices or to increase the amount of patients per practice in medical studies. Using the above formulas and variations of these formulas, the researcher can determine which combination of clusters and individuals will work for their study. Each design will encompass its own intricacies which make it unique. The researcher must balance the factors of cost, practicality, and scientific objectives when determining the appropriate sample size (Hayes et al., 2000). In addition to 17 sample size considerations, the researcher must also calculate the effect size in a different manner than with an individual design. Effect Sizes: Much like sample sizes, formulas and sample size allocation are different for cluster sampling designs as opposed to individual design. The desired effect sizes from these designs are also altered. An effect size measures the interventions effect on the individuals it has been given to. The effect size is the standardized difference between the treatment and the control group (Rooney & Murray, 1996). By calculating the effect size, the researcher can determine the sample size needed to achieve the desired power for their study. Effect sizes can be affected by the cluster effect of individuals within a group. This cluster effect (ICC) can cause a significant reduction in effect size and needs to be taken into account. For instance, Rooney and Murray (1996) stated that an ICC as low as .002 can cause the effect size to be reduced 30% in experiments containing at least 100 students per school. To account for the clustering effect, they suggest an adjusted effect size using the following formula: A4.) Donner and Klar (2002) suggest using meta-analysis studies to account for the clustering effect in defining an effect size for cluster randomized trials. They suggest four methods for obtaining a more accurate effect size when working with binary data in metaanalysis trials. The first approach they suggest is the ratio estimator approach. The ratio estimator approach developed by Rao and Scott (1992) divides the observed sample frequencies in a given study by the estimated design effect. 18 The second approach Donner and Klar suggested to evaluate effect size is the Adjusted Mantel-Haenszel test. This procedure is a commonly known procedure for evaluating binary data in individually randomized trials. This procedure compares the outcomes of each individual trial and weighs the differences by their variance. The trials with the most stable outcomes are more influential than those with the least (Donner & Klar, 2002). This procedure can be adjusted to fit cluster data, if the clusters and sample sizes are the same size by dividing the original equation by its inflation factor (Donner & Klar, 2002). The third procedure that Donner and Klar referenced is the Woolf procedure. It takes the effect sizes of trials with a small number of trials and a large size and transforms the intervention odds ratios to a logarithmitic scale. This scale is averaged using a weighting scale by Woolf (1955) and modified for clustering by Donner and Donald (1997). The final approach that Donner and Klar (2002) suggested was to use a randomization procedure such as Fischer’s permutation test. The advantages of this approach are its statistical validity. However, this comes at the expense of loss of power and the inability to easily make a covariate adjustment. These methods are used with binary data in metaanalysis trials, but they do demonstrate how the effects of clustering must be taken into account when determining accurate effect sizes in these trials. Hedges (2007) developed another method for adjusting effect sizes in cluster randomized trials. He reasoned that in cluster randomized trials there are several different mean differences to choose from. Each of these differences will yield a different definition for a population effect size. The three main effect size parameters that Hedges discusses are the within mean difference, the between mean difference, and the total mean difference between 19 the treatment and control groups. In order to determine which one will be the most practical is dependent on the interest of the researcher (Hedges, 2007). The within mean difference effect size can be defined as: A5.) This effect size would be typically used in single site studies. For example, if a school was determining to enact a certain reading program they may have several classrooms randomly assigned to receive the program and compare them to a control in which traditional methods were incorporated. The desired effect size could be determined using the within mean difference. The second effect size parameter is defined as: A6.) This effect size would be used in studies that have multiple sites but are allocated on the basis of the individual rather than the cluster. An example of this type of assignment strategy could be the assignments of students to different high schools within a district. The students could be assigned to different schools and then into classes from these schools to receive treatment or be held as the control. The key is that the individual is the level of assignment. The final effect size parameter that Hedges discussed was A7.) 20 This effect size parameter could be used to estimate effect sizes where the treatment effect is to be determined at the cluster level. For example, if students were allocated to different classrooms (clusters) to evaluate different teaching methods. The mean score of the class would serve as the test statistic and the effect of the teaching method would be evaluated to see which produced the most favorable results. The appropriate definition of mean differences must be chosen before attempting to achieve desired effect size. Once the appropriate parameter is determined, Hedges (2007) provided equations to estimate the effect size for the study, and it is from these estimates the sample size and power needed to achieve the desired effect can be determined. It is essential to account for the effect of clustering in unit of analysis, sample size, effect size, and power. Even though it well known these factors need to be addressed, it has been proven through meta-analysis studies that it is not always done. These statistical issues mentioned earlier are well known to researchers. However sometimes they are disregarded or ignored. In a study conducted by Isaakidis and Ioannidis (2003), It was determined that only 20% of studies in their sample (51) took clustering into account in their sample size and power and power calculations and only 37% took clustering into account in the analysis. Intracluster correlations and design effects were only reported in 2% and 6% of the trials respectively. The previous variables discussed occur in all cluster randomized trials and need to be accounted for. There are, however, different types of trials that can be designed using cluster or group randomized trials each of which have their own strengths and weaknesses. Cluster randomized trials can be done according to randomized, stratified, or matched designs. They can also be done with or without the use of covariates. The rationale for using stratification, 21 matching, or covariate schemes is to increase efficiency of designs to increase power. Due to the cost of gathering samples for studies, it becomes essential to use prior information when available to increase the likelihood of adequate power (Raudenbush, et al. (2007). However, in deciding to use these schemes, one needs to be careful not to sacrifice the integrity of a valid sample to increase power. Prior to treatment, experimental units can be separated or stratified into subclasses called blocks in which these blocks are perceived to be similar (Raudenbush, et al., (2007). A stratified randomized design is essentially a completely randomized design with the exception of two or more stratifying factors to increase the chance of a well balanced intervention groups (Lewsey, 2004). Some examples of stratification factors that can be used are cluster size, socioeconomic status, geographic location, or any other categorical factor that may be believed to influence groupings. Cluster randomized trials using stratification are chosen at the designs stage. If the strength of the stratification factors is believed to be high and does not seem to have an adequate number of clusters to achieve balance by not stratifying, then this design can increase power compared to that of its completely randomized counterpart (Lewsey, 2004). In a simulation study conducted by Lewsey (2004), he determined that stratifying by cluster size did indeed increase the power cluster randomized trials. The increase in power was most beneficial when the amount of clusters in the study were small. As the number of clusters was increased, the samples had a more appropriate chance to balance each other out. A general rule stated by Klar and Donner (2004), is that stratification should only be used when there is evidence that the strata represent important factors and when there are few individuals in the trial. Stratification is also more beneficial in cases where there are two or 22 more clusters per strata if there are twenty or more pairs. This will help to determine whether the matches have distinct rather than similar attributes. However, if given the choice between stratification and matching, Hayes et al. (2000) stated stratification is a more desirable option than that of matching. Matching is a form of pre-randomization blocking in which the blocks consist of two units that are believed to be equivalent on all variables with exception of the intervention (Raudenbush, et al., 2007). Matching is another alternative to attempt to increase the power of cluster randomized designs. Prior to treatment, experimental units can be separated into subclasses called blocks in which these blocks are perceived to be similar (Raudenbush, et al., 2007). Matching is not typically seen in trials randomizing individual participants. However they are the design choice of many community intervention trials because of their perceived ability to match groups on similar characteristics (Campbell, et al., 2007). Due to the inability to often obtain adequate sample sizes to individually randomize these trials, effectively matching helps to reduce the probability of creating groups that are substantially different in important baseline characteristics. However if matching is not correctly done, matching can cause more harm than good. (Campbell et al., 2007). According to Raudenbush et al. (2007), matching will enhance statistical power when groups are well matched and those characteristics strongly predict outcomes. The key factor to consider is the factor of variation that lies between groups which can be indexed by the ICC. If the ICC is large, then matching will be beneficial. However, if it is small, matching will not be effective and could possible hurt the study due to loss of degrees of freedom. The objective in matching is to select a matching variable that is highly correlated with the outcome measure (Hayes et al., 2000). Matching is also problematic because it is not 23 possible to distinguish the between cluster variance from the treatment effect heterogeneity. Because of this, the researcher cannot estimate separately the conventional intraclass cluster correlation and the variance of the estimated treatment effect between cluster members Campbell (2000). Matching does have certain advantages over using covariates. It does not require linear associations as the use of covariates do, and does provide more flexibility in design (Raudenbush, et al. (2007). Klar and Donner, (1997) were critical in their findings regarding matching. They stated that small samples are likely to achieve effective matching but large samples also have drawbacks when prior information is limited or matches that are close cannot be sought. For these reasons, stratified designs rather than matching designs are recommended. The third method to attempt to increase power in cluster designs is the use of covariates. Covariates are characteristics that are strong predictors of the outcome and are built into the study. Prior to treatment, experimental units can be separated into subclasses called blocks which are perceived to be similar (Raudenbush, et al., 2007). The covariate is added as part of the linear association along with other predicators. By adding so-called extraneous variables into the equation, the researcher can help to limit residuals factor thus greatly reducing the number of units needed to achieve a given power. Covariates may be inexpensive to acquire and to use and can greatly increase power depending on the ICC (Raudenbush, et al., 2007). Much like matching, if the ICC is large then the use of a group level covariate can be strong. However, like matching, if the ICC is small the impact of the covariate will also be small. The use of covariates also shares some of the other faults that matching does. For example, Stevens, (1992) noted that even using multiple covariates will not necessarily equate intact groups and that the variables used to equate 24 groups may cause a greater difference on others. Sawilowsky (2007) demonstrated this via a Monte Carlo simulation where a covariate adjustment of reading levels was incorrectly made on the basis of a pre-test and as a result incorporated to the design. When the post-test, which had less emphasis on reading ability, was given and the covariate adjustment was accounted for the study led to the false conclusion of the effectiveness of the treatment variable. In the previous sections, the difference between individualized randomized trials and cluster randomized trials were discussed as well as the different types of cluster designs. It is well known that ignoring these differences can affect the validity of cluster randomized trials, yet it is still done. In the following discussion, the validity of cluster randomized trials will be examined. Randomization and Validity: Randomization allows rationalizing that independent participants or groups of participants are, at least in theory, equal. It allows us to address the implications of internal and external validity that can cause a study to be flawed. The issues in validity were described by Campbell and Stanley (1963). They addressed some of the many forms of validity problems a study can have. Any one validity issue can make the experiment flawed. Validity problems in cluster randomization have been improving but still exist. Eldrige et al. (2008) found that 25% of their samples of cluster designs were potentially biased due to recruitment and identification of patients, and approximately 50% of the participants used blinding either by allocation or assessors. Approximately 50% of the studies adequately assessed generalizability of clusters and external validity seemed to be poorly addressed in many of the trials. 25 To limit the issues in validity, it is important that sample are drawn correctly. There are both differences and similarities in how to draw standard independent samples and cluster samples. Cochran (1977) stated the principal steps in any sample survey as followed: 1. State objective of survey 2. Identify the population to be sampled 3. Identify the data that are needed to be collected 4. Determine the degree of precision desired 5. Determine the method of measurement 6. Determining the frame or sampling units that will be used 7. Initiate pretest on small area to identify weaknesses in survey 8. Organize field work and training effectively 9. Summarize and analyze data 10. Evaluate information gained for future surveys The purpose of sampling theory is to make samples more efficient and random (Cochran, 1977). The steps above are basic steps in conducting survey sampling. These steps are not independent of cluster sampling. However there are some subtle differences. Cluster Sampling is not as accurate as simple random sampling however its use is appropriate when the lower cost per element more than makes up for its disadvantages. This scenario often occurs in large widespread samples (Kish, 1965). When it is appropriate to randomize by clusters, there are certain procedures that need to be followed. The parameters of the study must be defined. The researchers randomization scheme can be strong. However, if they are pulling from an improper population than the results may be misleading. If the sampling measures are sound, then they should mirror the 26 overall population as a whole (Upton, 1978). It is important to pay careful attention to time, location, and any other variables that may cause the sample to not be representative of the population. This is also true for subsampling. When subsampling, the individual participants need to be representative of the clusters they are drawn from. According to Sudman (1976), clusters must be well defined and every element must belong to one and only one cluster, the number of population elements must be known or have a reasonable estimate, clusters must be small enough to make clustering with while, and clusters should be chosen to limit the sampling error caused by clustering. Once clusters have been carefully designed and organized, the process of selecting them can begin. Hansen, Hurwitz, and Madow (1953) proposed the following procedure for selecting individuals from clusters: 1. Number primary selection units (psu’s) accurately 2. Select at random a page from random numbers table and scan down until number fits within your sample, continuing scanning random numbers from start point until the sample is satisfied 3. Divide each sampled block into four compact segments with roughly equal numbers of elementary units 4. Number the segments in each block and take a random number from that selection 5. Collect desired information from the selection Researchers prefer to use clusters of equal size. This is obviously not always possible to due to the nature of the experiment. Unequal clusters cause additional complexities in an experiment. According to Kish (1965), once unequal clusters are chosen sample size is no 27 longer a fixed and thus becomes a random variable. The ratio mean is not an unbiased estimate of the population mean, and practical variances are not unbiased estimates of true variances. To limit the problems caused by unequal clusters, Kish (1965) suggested selecting large numbers of clusters, stratifying clusters according to size, defining and combining natural clusters, and subsampling with probabilities proportionate to size (pps). To subsample with probabilities proportionate to size, the researcher is assigning to each cluster a sequence of random numbers equal to its size and the sampling systematically (Sudman, 1976). A size measure is assigned to each cluster and then cumulatively summed over all clusters. A sampling interval is then determined by taking the cumulatively summed total and dividing it by the number of clusters desired. A random start is then determined and the sample is drawn systematically from the start point (Sudman, 1976). However, when sampling pps. there are certain clusters that will be automatically included and thus not randomly selected. Even if proper randomization procedures are believed to be followed, there can be oversights. In cluster trials, the risk for potential bias can occur both in the selection of the cluster level and in the selection of the independent participants subjects (Pufer, et al., 2003). One of the ways to reduce possible bias is through blinding. Blinding is an attempt to keep trial participants, investigators, and/or assessors unaware of the interventions being assigned. The blinding can occur for one group of the study or for all. When blinding is being used, it is important for the researchers to clearly define what form of blinding is being used and to what extent (Schulz, & Grimes, 2002). In a study conducted by Johnson et al. (2008), it was discovered that there were biases in studies conducted in conflict mortality. The researcher had used random main streets as starting points and then proceeded to conduct their sample. 28 Johnson et al. (2008) determined that the results had been biased due to the fact that main streets are more highly trafficked and more likely to have casualties than random neighborhood streets. The researchers had properly randomized. However, they did not clearly consider a major contributing variable. In another case called the Edinburgh Trial which involved Breast Cancer Screening (Alexander, et al., 1989), there was a bias involving socioeconomic status that was not accounted for. In the Edinburgh Trial, participants were involved in a study to determine if breast cancer screening reduced mortality. It was discovered that mortality rate, regardless of intervention was higher for those of lower socioeconomic status. In this case the researchers may have been better served to account for socioeconomic status by stratification in the analysis stage (Alexander, et al., 1989). Once again, this is an issue that could have been addressed in the planning stages of the experiment. To increase statistical power and decrease bias, researchers need to consider possible covariates (Arceneaux, 2005). The examples listed above are common validity issues that can occur when not considering all possible variables. They have been discovered throughout the years when conducting cluster designs. To better understand the progression and evolution of the cluster randomized design, a brief history will be discussed. History of cluster designs: Randomization by cluster has been slower to develop compared to random assignment by individuals due to the added design and analysis requirements that cluster sampling entails. Donner and Klar (2000) stated that initial studies of cluster randomization can be traced back to a 1648 study done by Van Helmont. In this study, participants were assigned in lots to either the experimental group which received the 29 treatment of bloodletting or to the control group. This, however, cannot be defined as true randomization due to the fact that the study was not replicated. The statistical implication of cluster randomization was noted by Lindquist, (1940). He stated when employing cluster sampling in educational research that there is the possibility of a large systematic difference from school to school which could account for variability. Lindquist also stated that clustering in these statistical anomalies can be accounted for by testing the cluster means with standard statistical methods. Glass and Hopkins, (1996), argued that standard statistical methods could not be used in cluster samples. They stated that a common flaw in educational research is to select schools or classes at random and then students from those schools or classes. This violates the assumption of interdependence and can’t be considered a true random sample and the proper method of analysis for these types of studies (cluster sampling) often eludes the most seasoned researchers. Hansen and Hurwitz (1942) addressed the statistical anomalies of cluster sampling compared to individual sampling stating that the increase in variance due to clustering can be quite substantial even if the correlations among clusters is small. They stated that with increased cluster size the intra-cluster correlation will drop but not at rate that is slower than linear (Donner & Klar, 2000.) According to Donner and Klar (2000), many studies prior to 1978 did not properly account for clustering either in the design or the sampling with exception of the following; Comstock (1962), Ferbee et al. (1963), and Horwitz and Magnus (1974). Many of the ideas to account for clustering effects were adapted by Pollack (1966), who studied the organization and evaluation of trials of prophylactic agents for the control of communicable diseases. He noted that randomizing clusters rather than individuals are less likely to be balanced for 30 extraneous variables. However, these trials do provide administrative convenience, reduce the risk of treatment contamination, and increase the likelihood of subject participation (Donner and Klar, 2000). The Taichung experiment (Berelson &Freedman 1964) is also referenced as a noteworthy cluster design acknowledging the authors took detailed approaches to both randomize and analyze the clustered data (Donner & Klar, 2000). Cornfield (1978) discussed the statistical implications for clustering. Cornfield discussed the statistical efficacy of clustering and concluded that sample size must be inflated to account for the effects of randomizing clusters rather than individuals. He summarized his conclusions by stating that randomizing by cluster should not be discouraged, but the study will yield less information than if it was conducted with individuals as the unit of analysis. This limitation can be accounted for by raising the sample size included in the study. The analysis performed in cluster studies must account clustering or the results could prove to be misleading. According to Donner and Klar (2000), after 1978, researchers began to understand the complexities involved in clustering but had few resources to use to develop appropriate cluster designs. This problem began to be addressed by (Gilliam et al., 1980; Donner et al., 1981). However, many authors continued to publish articles that did not address these issues. Donner (1990) examined studies that employed clustering in their design He evaluated these on the following factors: justification for employing cluster randomization, between cluster variation accounted for in sample size and/or power calculations, between cluster variation accounted for in the analysis, baseline reporting of prognostic factors consideration of prognostic factors in the analysis, and the reports of participants loss due to follow-up. In his evaluation of sixteen studies, he found that only four of them gave reason for justification of clustering. Only three of the sixteen designs accounted for between cluster variation in the 31 sample size or power of the design. Half of the designs accounted for the between cluster variation in the analysis of the studies. Thirteen of the sixteen studies accounted for baseline prognostic factors and thirteen of the sixteen considered those prognostic factors in the analysis. Fourteen of the sixteen studies included the loss of participants in the analysis. It should also be noted that half of the trial reviewed used traditional statistical methods to interpret the results. The Type-I error associated with these procedures is likely to be greatly increased as a result of clustering (Donner, 1990). The studies that have not taken the necessary steps regarding clustering cannot be deemed valid and thus their results, although they may be significant, can be misleading. A similar review was published by (Simpson et. al, 1995). They examined primary prevention trials through the years of 1990-1993. They evaluated 21 articles during this time period and determined that only 4 (19%) accounted for clustering in the sample size and power analysis of their design. They also discovered that only 12 (57%) accounted for clustering in their statistical analysis. The methodology and criteria for conducting cluster randomized trials are clearly available however many still seem to either ignore them or are still unaware of them. This oversight or neglecting of proper experimental design affects the validity of the results of experiments that are conducted. It is the intent of this study to determine to what extent the validity has been affected. 32 CHAPTER III METHODOLOGY In order for a study to be valid, it is essential to randomize selection at both levels of a cluster design. There are often times when the cluster level of a two stage cluster design will be purposely selected. When this occurs, the study violates the principle for randomization. It is the purpose of this study to demonstrate the extent of the effects of this violation. To accomplish this, the use of Monte Carlo Methods will be incorporated. Monte Carlo is repeated sampling from a probability distribution to determine the long run average of a parameter that the researcher intends to study (Sawilowsky & Fahoome, 2003). These methods will be used to create a simulation which is the representation with a model to a real life characteristic (Sawilowsky & Fahoome, 2003). For the purpose of this study, the simulation will be used to represent students’ achievement scores. The simulation will be generated on a WINTEL compatible personal computer using Compaq 6.6c Visual Fortran. This simulation will answer the following research question: To what extent is a cluster sample biased when the researcher fails to randomly select clusters in the initial stage of a two stage cluster sample? This simulation will create data that will be used for a two stage cluster design. The design will have a population of 100 clusters of equal size that each have 100 individual scores randomly assigned to them from a normal or Gaussian data set. Throughout the 20th century, it was believed that the Gaussian curve was a good model to demonstrate the likely outcomes of educational or psychological testing (Sawilowsky & Fahoome, 2003). In this model, the majority of data is centered around the mean with the µ=0 and the =1 Many educational tests are norm referenced meaning that they compare the individual being tested to the general population. In this form of testing, the overall results will fall along the normal curve by 33 definition. There are often times when results from educational or psychological testing do not fall within the normal curve. However, for the purposes of this study it will show results under the best possible circumstances. After the scores have been assigned, each cluster will have their mean computed. These clusters will then be rank ordered from highest to lowest to determine purposeful selection later in the study. The initial number of clusters will be set to two. Individual scores from each cluster will be randomly chosen, representing the second stage. After the lower and upper limits of the confidence intervals have been obtained, they will be stored. The simulation will be repeated for a total of 10,000 replications. At the conclusion, an overall mean will be computed for the upper and lower limit of the confidence intervals and recorded. The simulation will then be repeated, this time using the two clusters with the greatest means. This process will be repeated 10,000 times and the overall mean of these confidence intervals will be computed and stored. The upper limit and lower limit of the confidence interval of the randomly selected group and the purposefully selected group will be compared and the difference will be computed. The width of the confidence interval for the random selection will also be computed and compared to the width of the confidence interval for the purposeful selection using a proportion. This process will be completed 19 times, increasing the number of clusters by one (i. e., 2 clusters, 3 clusters, 4 clusters, etc.) until 20 of 100 random and purposeful clusters are chosen and compared. In the majority of educational testing, the researcher must account for extraneous variables, meaning outside influences that the researcher could not be accounted for or tested. One of the advantages of using a Monte Carlo design is that the study operates in a controlled environment. Therefore, there are not any extraneous variables that can influence the study. 34 As long as the distribution is representative of the population that is tested, the results will not be skewed by an outside influence. The simulation uses the data that are provided and does exactly what the researcher programs it to. Due to the control the researcher has in setting up the simulation, it is not possible for extraneous variables to affect the study. 35 CHAPTER IV RESULTS Ceteras paribus, equal cluster sampling already lacks the power available in a simple random sampling. The inefficiency can be quantified by using the following formula (Sudman, 1976): A8. This will give rho (ρ) for a cluster sample compared to a simple random sample. The magnitude of ρ can be computed in this manner or referenced from previous studies (e.g., Sudman, 1976). After the ρ can be determined, the ratio of sampling error between cluster sampling and simple random sampling can be computed as followed: A9. . For the purposes of this study, a completed rho chart (see appendix B) has been compiled. A section of that chart appears below (figure 4.1). Using the number of participants in each cluster, and the approximate rho value, the sampling error of a simple random sample to that of cluster sample can be determined. For example, a cluster containing 10 participants and a ρ value of .2 would have a 1.18 sampling error compared to that of the same size simple random sample. Clearly, using a cluster sample compared with a simple random sample affects the integrity of the study, which may only be acceptable if considerations of cost saving is paramount. 36 Table 4.1 Rho Table N_bar rho=.01 rho=.02 rho=.03 rho=.04 rho=.05 1 1 1 1 1 1 2 1.01 1.02 1.03 1.04 1.05 3 1.02 1.04 1.06 1.08 1.1 4 1.03 1.06 1.09 1.12 1.15 5 1.04 1.08 1.12 1.16 1.2 6 1.05 1.1 1.15 1.2 1.25 7 1.06 1.12 1.18 1.24 1.3 8 1.07 1.14 1.21 1.28 1.35 9 1.08 1.16 1.24 1.32 1.4 10 1.09 1.18 1.27 1.36 1.45 In order to determine the results of the study, the confidence interval was analyzed for both random and purposeful clusters at each number of clusters. Graphs and tables were developed for the lower limit and the upper limit of each confidence interval. The width of random and purposeful confidence intervals were also compared using a proportion. First, the lower limit of the confidence intervals for both random and purposeful selection will be analyzed. Figure 4.1 shows a graphical comparison between the lower limit of the confidence intervals for random cluster selection versus a lower limit of the confidence intervals for a purposeful cluster selection. Below this graph is a table (Table 4.2) with the actual results. The first observation to note is that the lower limit of the confidence intervals for the random selection of clusters remains consistent independent of the number of clusters chosen. The same cannot be said for the lower limit of the confidence intervals for the purposeful selection of clusters. The lower limit of the confidence intervals for the purposeful selection of clusters shows variability dependent on the number of clusters being chosen. The largest discrepancy between the purposeful lower limit of the confidence intervals versus the random lower limit of 37 the confidence intervals occurred during the initial selection size of two clusters. In this stage, the difference between the lower limit of the confidence interval for purposeful selection compared to the lower limit of the confidence interval for random selection was 1.8 (-1.948515 to -.15). The next selection size of three clusters showed an improvement in the lower limit of the confidence intervals between purposeful and random clusters to a difference of 1.15 (1.29901 to -0.1502589). The difference in the lower limit of the confidence intervals between purposeful and random selection of clusters continues to decrease until the last simulation is compiled using 20 clusters. At this stage, the difference between the purposeful lower limit of the confidence interval and the random lower limit of the confidence interval was .02 (0.1948515 to -0.1774427). 0 -0.5 -1 Lower Purposeful Selection Lower Random Selection -1.5 -2 -2.5 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Figure 4.1: Random vs. Purposeful Lower Limit Cluster Results 38 Table 4.2: Random vs. Purposeful Lower Limit Cluster Results Number of Clusters 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Lower Purposeful Selection Lower Random Selection -1.9485 -1.2990 -0.9743 -0.7794 -0.6495 -0.5567 -0.4871 -0.4330 -0.3897 -0.3543 -0.3248 -0.2998 -0.2784 -0.2598 -0.2436 -0.2292 -0.2165 -0.2051 -0.1949 -0.1534 -0.1503 -0.1549 -0.2029 -0.1606 -0.1368 -0.1362 -0.1348 -0.1487 -0.1401 -0.1413 -0.1581 -0.1495 -0.1603 -0.1587 -0.1636 -0.1740 -0.1805 -0.1774 Difference 1.7951 1.1488 0.8194 0.5765 0.4889 0.4199 0.3509 0.2982 0.2410 0.2142 0.1834 0.1417 0.1289 0.0995 0.0849 0.0657 0.0425 0.0246 0.0174 Similar results were compiled for use of the upper limit of the confidence intervals. Figure 4.2 shows a graphical comparison between the upper limit of the confidence intervals for purposeful cluster selection versus random cluster selection. Below this graph is a table (Table 4.3) with the actual results. Once again, the first observation is that the overall random selection of the upper limit of the confidence interval clusters remained consistent independent of the number of clusters. The largest discrepancy between the upper limit of the confidence interval for purposeful versus random cluster selection occurred during the initial selection size of two clusters. In this 39 stage, the difference between the upper limit of the confidence interval between purposeful and random selection was 1.85 (2.10664 to 0.256395 ). The next purposeful selection size of three clusters showed an improvement to a difference of 1.16 (1.404427 to 0.24761) between the upper limit of the confidence interval of purposeful versus random selection. The difference continues to decrease until the last simulation is compiled using 20 clusters. At this stage, the difference between the upper limit of the confidence intervals between purposeful versus random cluster selection was -.01 (0.210664 to 0.218487). 2.5 2 1.5 Upper Purposeful Selection Upper Random Selection 1 0.5 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Figure 4.2: Random vs. Purposeful Upper Limit Cluster Results 40 Table 4.3: Random vs. Purposeful Upper Limit Cluster Results Number of Clusters 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Upper Purposeful Selection 2.1066 1.4044 1.0533 0.8427 0.7022 0.6019 0.5267 0.4681 0.4213 0.3830 0.3511 0.3241 0.3009 0.2809 0.2633 0.2478 0.2341 0.2218 0.2107 Upper Random Selection 0.2564 0.2476 0.2409 0.1949 0.2384 0.2584 0.2603 0.2641 0.2487 0.2592 0.2564 0.2388 0.2471 0.2349 0.2370 0.2291 0.2203 0.2172 0.2185 Difference 1.8502 1.1568 0.8124 0.6478 0.4638 0.3435 0.2663 0.2041 0.1726 0.1239 0.0947 0.0853 0.0539 0.0460 0.0263 0.0188 0.0138 0.0046 -0.0078 It is evident that there is a difference between purposeful and random selection in both the lower limit and upper limit of the confidence intervals. This difference makes the width of the overall purposeful confidence interval greater than the width of the random confidence interval. The difference in the width of the confidence intervals between the purposeful and random selection is dependent on the number of clusters chosen. The extent of that width was examined in graph and table below (Figure 4.3 and table 4.4). 41 Proportion of purposeful cluster over random cluster 12 10 8 Proportion of purposeful cluster over random cluster 6 4 2 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Figure 4.3: Proportion of purposeful cluster over random cluster Table 4.4: Proportion of purposeful cluster over random cluster Number of Clusters Proportion of purposeful cluster over random cluster 2 3 4 5 6 7 8 9 10 11 9.8952 6.7948 5.1229 4.0782 3.3882 2.9314 2.5566 2.2593 2.0407 1.8468 42 12 13 14 15 16 17 18 19 20 1.6994 1.5719 1.4608 1.3680 1.2811 1.2150 1.1429 1.0733 1.0242 It is evident that the proportion of the width of the confidence intervals between purposeful clusters selection compared to random clusters is also different. It is similar to the previous two graphs of lower limit and upper limit of the confidence intervals. As the number of clusters increase, the ratio between the purposeful and random samples decreases. For example, the width of the confidence interval for purposeful cluster selection using two clusters was 9.9 times greater than the width of the random cluster selection using two clusters. The width of the confidence interval of a purposeful cluster selection using three clusters is 6.8 times greater than its random cluster selection counterpart. The difference in the overall width of confidence intervals continue to decrease as the number of clusters increases until its last simulation at 20 clusters. Using 20 clusters, the width of the purposeful selection is 1.02 times greater than the random selection. According to these results, it is obvious that purposeful selection shows a greater width in confidence intervals at each number of clusters compared to random selection. This ratio in the width between purposeful and random selection of confidence intervals decreases as the amount of clusters selected increases showing that the number of clusters has an effect on the results of the simulation. 43 Chapter 5 Conclusions and Recommendations The most precise sampling approach is a simple random sample. However, in cases where populations are designated in specific areas or clusters, cluster sampling can be done to save on time or cost. Conducting a cluster sample will cause a decrease in efficiency of the results. When the researcher compounds this with improper or purposeful selection of clusters, the results of the study can become greatly skewed, even under the most optimal conditions. This study determined how much those results would be skewed. In chapter II and IV, it was discussed the effect that clustering can have on a study. Using the ρ (rho) chart (Appendix B), it is easily determined via mathematical methods that the effect clustering can have on a study is dependent on the ρ value and the number of participants that are in a cluster, at least under the assumption of normality. This chart signifies that even under normal conditions with randomization cluster sampling is still not as statistically efficient as simple random sampling. When the researcher adds to the flaws of cluster sampling by using purposeful sampling, it increases the likelihood that a study becomes more flawed. Using normal data, it was determined that the ratio of the confidence interval for purposeful selection of clusters was almost ten times greater than the confidence interval for random cluster selection using two clusters. Given that this is a small number of clusters for a study, it is still significant to note that a study using these parameters would yield a confidence interval ten times wider than of its random counterpart. A study, already compromised due to the effects of clustering, will be worse using purposeful cluster sampling. These results would 44 be at best case scenario useless and at worst case detrimental to those who acted on the results. A study consisting of two clusters is unlikely, so consider the next three results. The simulation containing a number of clusters equal to three had a confidence interval for purposeful cluster selection 6.79 times wider than that of its random counterpart. The purposeful cluster selection containing four clusters had a confidence interval 5.12 times wider than its random counterpart, and the simulation study for purposeful selection containing five clusters is 4.1 times wider than that of its random counterpart. There are two observations that can be drawn from this. The first is that in the early stage of cluster selections (number of clusters equal to 2, 3, 4, and 5) the width of the confidence interval between purposeful and random selection is at best four times larger than it should be. Using results from any of these studies would be useless or incredibly misleading. The second observation to note is that increase in the number of clusters has a significant impact on the results in the early stages of cluster selection. In the first four simulations, the width of the confidence interval decreased by a minimum of 1.1 times between number of clusters equal to four and five and a maximum of 3.1 times between number of clusters equal to two and three. These differences in confidence intervals between clusters begin to narrow in the next four simulations. In the next four simulations, (number of clusters equal to 6, 7, 8, and 9) the width of the confidence intervals between individual number of clusters (i.e number of clusters equal to 6 or 7) differ from 3.38 to 2.04 times greater than the corresponding cluster sample. As one can see, the difference between confidence intervals during these four cluster sizes is considerably less than the first four simulations that were conducted. The largest difference between any two corresponding number of clusters in these four simulations was 0.69 and the smallest was 45 0.3. These results suggest the width in confidence intervals between purposeful and random selection beginning to narrow. This trend continues in the final eleven selections as the ratio of confidence intervals for purposeful selection of clusters is as high as 2.04 times (number of clusters equal to 10) and as low as 1.02 times (number of clusters equal to 20) its random counterpart. In the final 11 cluster simulations, the results truly begin to narrow from a difference in ratio of 0.19 (number of clusters equal to 11 and 12) to a difference in ratio of 0.05 (number of clusters equal to 19 and 20). Even at number of clusters equal to 20, which is the best case scenario according to this study, the purposeful cluster selection yielded a ratio of 1.02 times greater than that of a random sample. According to this result, the width of the purposeful confidence interval is increased by 2% compared to that of a random sample. 2% may seem minor however put in the perspective of medical research it could be critical. The results from the random cluster selections are also important to review. If one takes a look at the random results according to the study, one would note that on average there is a width of 0.4 between the upper and lower limit confidence intervals. There are two conclusions that can be derived from this. The first is the number of clusters has a minimal effect on the results of the random selection and the second is true randomization works. If the researcher randomizes in the selection and assignment of a study, the results will be more accurate than that of purposeful selection. The second conclusion is important to note for the researcher who insists on including specific clusters in a study to make that study “relevant.” Using these results, they can conclude that regardless of the number of clusters and the specific clusters that are chosen there will be a confidence interval of approximately 0.40 for random cluster 46 selection. This proves that if the desired clusters are not in a study but randomization is done correctly that the results will be consistent over time. It is also important to note that these results are using data from a normal distribution. Participants in real life will not be as predictable as this. By using data from a normal (Gaussian) distribution, the researcher knows that half of the scores will fall below the mean and half will fall above the mean in the shape of a perfect bell curve. This is not the case in applied situations because participants do not fit into a bell curve model (Micceri, 1988). Participants in applied situations are influenced by a variety of factors. This is especially true in cluster sampling due to the clustering effect. The method for cluster sampling is clearly defined by past research yet the need to include important clusters in the studies clouds the judgment of some researchers. If it is crucial to include certain participants in the study, then there are two possibilities. The first is to increase the likelihood of those clusters being chosen by increasing the number of clusters selected. However, this is not a failsafe method and may defeat the rationale for cluster sampling which is to save time and money. The number of clusters could be increased but they still may not return the desired clusters. Furthermore, if it doesn’t and the researcher “randomly” selects clusters again until they get the desired results, they are essentially doing an ex post facto study which is not valid form of study. The second method the researcher could do is to change the design of the study. For example, if there is the necessity to include the major counties of a state in a study then make the study’s parameters that of those counties. The study would not be pertinent to the entire state, even though it would include the desired population. The best option is to select the desired population and conduct the study completely following the rules of randomization at all 47 levels. If the selection does not include the desired clusters, the researcher may or may not be discouraged but at least it would not lead to as Cornfield (1978, pp.101) stated “an exercise in self deception.” 48 APPENDIX A FORMULAS A1. Interclass Correlation Coefficient A2. Variance Inflation Factor D 1 (m 1) p A3. Sample size formula for cluster trials A4. Adjusted effect size formula for cluster trials A5. Within mean difference effect size I A6. Within mean difference effect size II 49 A7. Within mean difference effect size III A8. Rho formula for cluster trials A9. Proportion of error between simple random and cluster sample . 50 APPENDIX B RHO CHART N_bar rho=.01 rho=.02 rho=.03 rho=.04 rho=.05 rho=.06 rho=.07 rho=.08 1 2 3 4 5 6 7 1 1.01 1.02 1.03 1.04 1.05 1.06 1 1.02 1.04 1.06 1.08 1.1 1.12 1 1.03 1.06 1.09 1.12 1.15 1.18 1 1.04 1.08 1.12 1.16 1.2 1.24 1 1.05 1.1 1.15 1.2 1.25 1.3 1 1.06 1.12 1.18 1.24 1.3 1.36 1 1.07 1.14 1.21 1.28 1.35 1.42 1 1.08 1.16 1.24 1.32 1.4 1.48 8 9 10 11 12 13 14 1.07 1.08 1.09 1.1 1.11 1.12 1.13 1.14 1.16 1.18 1.2 1.22 1.24 1.26 1.21 1.24 1.27 1.3 1.33 1.36 1.39 1.28 1.32 1.36 1.4 1.44 1.48 1.52 1.35 1.4 1.45 1.5 1.55 1.6 1.65 1.42 1.48 1.54 1.6 1.66 1.72 1.78 1.49 1.56 1.63 1.7 1.77 1.84 1.91 1.56 1.64 1.72 1.8 1.88 1.96 2.04 15 16 17 18 19 20 21 1.14 1.15 1.16 1.17 1.18 1.19 1.2 1.28 1.3 1.32 1.34 1.36 1.38 1.4 1.42 1.45 1.48 1.51 1.54 1.57 1.6 1.56 1.6 1.64 1.68 1.72 1.76 1.8 1.7 1.75 1.8 1.85 1.9 1.95 2 1.84 1.9 1.96 2.02 2.08 2.14 2.2 1.98 2.05 2.12 2.19 2.26 2.33 2.4 2.12 2.2 2.28 2.36 2.44 2.52 2.6 22 23 24 25 26 27 28 1.21 1.22 1.23 1.24 1.25 1.26 1.27 1.42 1.44 1.46 1.48 1.5 1.52 1.54 1.63 1.66 1.69 1.72 1.75 1.78 1.81 1.84 1.88 1.92 1.96 2 2.04 2.08 2.05 2.1 2.15 2.2 2.25 2.3 2.35 2.26 2.32 2.38 2.44 2.5 2.56 2.62 2.47 2.54 2.61 2.68 2.75 2.82 2.89 2.68 2.76 2.84 2.92 3 3.08 3.16 29 30 31 32 33 34 35 1.28 1.29 1.3 1.31 1.32 1.33 1.34 1.56 1.58 1.6 1.62 1.64 1.66 1.68 1.84 1.87 1.9 1.93 1.96 1.99 2.02 2.12 2.16 2.2 2.24 2.28 2.32 2.36 2.4 2.45 2.5 2.55 2.6 2.65 2.7 2.68 2.74 2.8 2.86 2.92 2.98 3.04 2.96 3.03 3.1 3.17 3.24 3.31 3.38 3.24 3.32 3.4 3.48 3.56 3.64 3.72 36 1.35 1.7 2.05 2.4 2.75 3.1 3.45 3.8 37 1.36 1.72 2.08 2.44 2.8 3.16 3.52 3.88 51 N_bar rho=.01 rho=.02 rho=.03 rho=.04 rho=.05 rho=.06 rho=.07 rho=.08 38 39 40 1.37 1.38 1.39 1.74 1.76 1.78 2.11 2.14 2.17 2.48 2.52 2.56 2.85 2.9 2.95 3.22 3.28 3.34 3.59 3.66 3.73 3.96 4.04 4.12 41 42 43 44 45 46 47 1.4 1.41 1.42 1.43 1.44 1.45 1.46 1.8 1.82 1.84 1.86 1.88 1.9 1.92 2.2 2.23 2.26 2.29 2.32 2.35 2.38 2.6 2.64 2.68 2.72 2.76 2.8 2.84 3 3.05 3.1 3.15 3.2 3.25 3.3 3.4 3.46 3.52 3.58 3.64 3.7 3.76 3.8 3.87 3.94 4.01 4.08 4.15 4.22 4.2 4.28 4.36 4.44 4.52 4.6 4.68 48 49 50 51 52 53 54 1.47 1.48 1.49 1.5 1.51 1.52 1.53 1.94 1.96 1.98 2 2.02 2.04 2.06 2.41 2.44 2.47 2.5 2.53 2.56 2.59 2.88 2.92 2.96 3 3.04 3.08 3.12 3.35 3.4 3.45 3.5 3.55 3.6 3.65 3.82 3.88 3.94 4 4.06 4.12 4.18 4.29 4.36 4.43 4.5 4.57 4.64 4.71 4.76 4.84 4.92 5 5.08 5.16 5.24 55 56 57 58 59 60 61 1.54 1.55 1.56 1.57 1.58 1.59 1.6 2.08 2.1 2.12 2.14 2.16 2.18 2.2 2.62 2.65 2.68 2.71 2.74 2.77 2.8 3.16 3.2 3.24 3.28 3.32 3.36 3.4 3.7 3.75 3.8 3.85 3.9 3.95 4 4.24 4.3 4.36 4.42 4.48 4.54 4.6 4.78 4.85 4.92 4.99 5.06 5.13 5.2 5.32 5.4 5.48 5.56 5.64 5.72 5.8 62 63 64 65 1.61 1.62 1.63 1.64 2.22 2.24 2.26 2.28 2.83 2.86 2.89 2.92 3.44 3.48 3.52 3.56 4.05 4.1 4.15 4.2 4.66 4.72 4.78 4.84 5.27 5.34 5.41 5.48 5.88 5.96 6.04 6.12 66 67 68 1.65 1.66 1.67 2.3 2.32 2.34 2.95 2.98 3.01 3.6 3.64 3.68 4.25 4.3 4.35 4.9 4.96 5.02 5.55 5.62 5.69 6.2 6.28 6.36 69 70 71 72 73 74 75 1.68 1.69 1.7 1.71 1.72 1.73 1.74 2.36 2.38 2.4 2.42 2.44 2.46 2.48 3.04 3.07 3.1 3.13 3.16 3.19 3.22 3.72 3.76 3.8 3.84 3.88 3.92 3.96 4.4 4.45 4.5 4.55 4.6 4.65 4.7 5.08 5.14 5.2 5.26 5.32 5.38 5.44 5.76 5.83 5.9 5.97 6.04 6.11 6.18 6.44 6.52 6.6 6.68 6.76 6.84 6.92 76 77 1.75 1.76 2.5 2.52 3.25 3.28 4 4.04 4.75 4.8 5.5 5.56 6.25 6.32 7 7.08 52 N_bar rho=.01 rho=.02 rho=.03 rho=.04 rho=.05 rho=.06 rho=.07 rho=.08 78 79 80 1.77 1.78 1.79 2.54 2.56 2.58 3.31 3.34 3.37 4.08 4.12 4.16 4.85 4.9 4.95 5.62 5.68 5.74 6.39 6.46 6.53 7.16 7.24 7.32 81 82 83 84 85 86 87 1.8 1.81 1.82 1.83 1.84 1.85 1.86 2.6 2.62 2.64 2.66 2.68 2.7 2.72 3.4 3.43 3.46 3.49 3.52 3.55 3.58 4.2 4.24 4.28 4.32 4.36 4.4 4.44 5 5.05 5.1 5.15 5.2 5.25 5.3 5.8 5.86 5.92 5.98 6.04 6.1 6.16 6.6 6.67 6.74 6.81 6.88 6.95 7.02 7.4 7.48 7.56 7.64 7.72 7.8 7.88 88 89 90 91 92 93 94 1.87 1.88 1.89 1.9 1.91 1.92 1.93 2.74 2.76 2.78 2.8 2.82 2.84 2.86 3.61 3.64 3.67 3.7 3.73 3.76 3.79 4.48 4.52 4.56 4.6 4.64 4.68 4.72 5.35 5.4 5.45 5.5 5.55 5.6 5.65 6.22 6.28 6.34 6.4 6.46 6.52 6.58 7.09 7.16 7.23 7.3 7.37 7.44 7.51 7.96 8.04 8.12 8.2 8.28 8.36 8.44 95 96 97 98 99 100 1.94 1.95 1.96 1.97 1.98 1.99 2.88 2.9 2.92 2.94 2.96 2.98 3.82 3.85 3.88 3.91 3.94 3.97 4.76 4.8 4.84 4.88 4.92 4.96 5.7 5.75 5.8 5.85 5.9 5.95 6.64 6.7 6.76 6.82 6.88 6.94 7.58 7.65 7.72 7.79 7.86 7.93 8.52 8.6 8.68 8.76 8.84 8.92 53 N_bar 1 2 3 4 rho=.09 1 1.09 1.18 1.27 rho=.1 1 1.1 1.2 1.3 rho=.11 1 1.11 1.22 1.33 rho=0.12 1 1.12 1.24 1.36 rho=0.13 1 1.13 1.26 1.39 rho=0.14 1 1.14 1.28 1.42 5 6 7 8 9 10 11 1.36 1.45 1.54 1.63 1.72 1.81 1.9 1.4 1.5 1.6 1.7 1.8 1.9 2 1.44 1.55 1.66 1.77 1.88 1.99 2.1 1.48 1.6 1.72 1.84 1.96 2.08 2.2 1.52 1.65 1.78 1.91 2.04 2.17 2.3 1.56 1.7 1.84 1.98 2.12 2.26 2.4 12 13 1.99 2.08 2.1 2.2 2.21 2.32 2.32 2.44 2.43 2.56 2.54 2.68 14 15 16 17 18 2.17 2.26 2.35 2.44 2.53 2.3 2.4 2.5 2.6 2.7 2.43 2.54 2.65 2.76 2.87 2.56 2.68 2.8 2.92 3.04 2.69 2.82 2.95 3.08 3.21 2.82 2.96 3.1 3.24 3.38 19 20 21 22 23 24 25 2.62 2.71 2.8 2.89 2.98 3.07 3.16 2.8 2.9 3 3.1 3.2 3.3 3.4 2.98 3.09 3.2 3.31 3.42 3.53 3.64 3.16 3.28 3.4 3.52 3.64 3.76 3.88 3.34 3.47 3.6 3.73 3.86 3.99 4.12 3.52 3.66 3.8 3.94 4.08 4.22 4.36 26 27 28 3.25 3.34 3.43 3.5 3.6 3.7 3.75 3.86 3.97 4 4.12 4.24 4.25 4.38 4.51 4.5 4.64 4.78 29 30 31 32 3.52 3.61 3.7 3.79 3.8 3.9 4 4.1 4.08 4.19 4.3 4.41 4.36 4.48 4.6 4.72 4.64 4.77 4.9 5.03 4.92 5.06 5.2 5.34 33 34 35 36 37 38 39 3.88 3.97 4.06 4.15 4.24 4.33 4.42 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.52 4.63 4.74 4.85 4.96 5.07 5.18 4.84 4.96 5.08 5.2 5.32 5.44 5.56 5.16 5.29 5.42 5.55 5.68 5.81 5.94 5.48 5.62 5.76 5.9 6.04 6.18 6.32 40 4.51 4.9 5.29 5.68 6.07 6.46 54 N_bar 41 42 43 rho=.09 4.6 4.69 4.78 rho=.1 5 5.1 5.2 rho=.11 5.4 5.51 5.62 rho=0.12 5.8 5.92 6.04 rho=0.13 6.2 6.33 6.46 rho=0.14 6.6 6.74 6.88 44 45 46 47 48 49 50 4.87 4.96 5.05 5.14 5.23 5.32 5.41 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.73 5.84 5.95 6.06 6.17 6.28 6.39 6.16 6.28 6.4 6.52 6.64 6.76 6.88 6.59 6.72 6.85 6.98 7.11 7.24 7.37 7.02 7.16 7.3 7.44 7.58 7.72 7.86 51 52 53 54 55 56 57 5.5 5.59 5.68 5.77 5.86 5.95 6.04 6 6.1 6.2 6.3 6.4 6.5 6.6 6.5 6.61 6.72 6.83 6.94 7.05 7.16 7 7.12 7.24 7.36 7.48 7.6 7.72 7.5 7.63 7.76 7.89 8.02 8.15 8.28 8 8.14 8.28 8.42 8.56 8.7 8.84 58 59 60 61 62 63 64 6.13 6.22 6.31 6.4 6.49 6.58 6.67 6.7 6.8 6.9 7 7.1 7.2 7.3 7.27 7.38 7.49 7.6 7.71 7.82 7.93 7.84 7.96 8.08 8.2 8.32 8.44 8.56 8.41 8.54 8.67 8.8 8.93 9.06 9.19 8.98 9.12 9.26 9.4 9.54 9.68 9.82 65 66 67 68 69 70 71 6.76 6.85 6.94 7.03 7.12 7.21 7.3 7.4 7.5 7.6 7.7 7.8 7.9 8 8.04 8.15 8.26 8.37 8.48 8.59 8.7 8.68 8.8 8.92 9.04 9.16 9.28 9.4 9.32 9.45 9.58 9.71 9.84 9.97 10.1 9.96 10.1 10.24 10.38 10.52 10.66 10.8 72 73 74 75 76 77 78 7.39 7.48 7.57 7.66 7.75 7.84 7.93 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.81 8.92 9.03 9.14 9.25 9.36 9.47 9.52 9.64 9.76 9.88 10 10.12 10.24 10.23 10.36 10.49 10.62 10.75 10.88 11.01 10.94 11.08 11.22 11.36 11.5 11.64 11.78 79 8.02 8.8 9.58 10.36 11.14 11.92 80 8.11 8.9 9.69 10.48 11.27 12.06 55 N_bar rho=.09 rho=.1 rho=.11 rho=0.12 rho=0.13 rho=0.14 81 82 83 84 8.2 8.29 8.38 8.47 9 9.1 9.2 9.3 9.8 9.91 10.02 10.13 10.6 10.72 10.84 10.96 11.4 11.53 11.66 11.79 12.2 12.34 12.48 12.62 85 86 87 88 89 90 91 8.56 8.65 8.74 8.83 8.92 9.01 9.1 9.4 9.5 9.6 9.7 9.8 9.9 10 10.24 10.35 10.46 10.57 10.68 10.79 10.9 11.08 11.2 11.32 11.44 11.56 11.68 11.8 11.92 12.05 12.18 12.31 12.44 12.57 12.7 12.76 12.9 13.04 13.18 13.32 13.46 13.6 92 93 94 95 96 97 98 9.19 9.28 9.37 9.46 9.55 9.64 9.73 10.1 10.2 10.3 10.4 10.5 10.6 10.7 11.01 11.12 11.23 11.34 11.45 11.56 11.67 11.92 12.04 12.16 12.28 12.4 12.52 12.64 12.83 12.96 13.09 13.22 13.35 13.48 13.61 13.74 13.88 14.02 14.16 14.3 14.44 14.58 99 100 9.82 9.91 10.8 10.9 11.78 11.89 12.76 12.88 13.74 13.87 14.72 14.86 56 N_bar 1 2 3 4 rho=0.15 1 1.15 1.3 1.45 rho= .16 1 1.16 1.32 1.48 rho=.17 1 1.17 1.34 1.51 rho=.18 1 1.18 1.36 1.54 rho=.19 1 1.19 1.38 1.57 5 6 7 8 9 10 11 1.6 1.75 1.9 2.05 2.2 2.35 2.5 1.64 1.8 1.96 2.12 2.28 2.44 2.6 1.68 1.85 2.02 2.19 2.36 2.53 2.7 1.72 1.9 2.08 2.26 2.44 2.62 2.8 1.76 1.95 2.14 2.33 2.52 2.71 2.9 12 13 2.65 2.8 2.76 2.92 2.87 3.04 2.98 3.16 3.09 3.28 14 15 16 17 18 2.95 3.1 3.25 3.4 3.55 3.08 3.24 3.4 3.56 3.72 3.21 3.38 3.55 3.72 3.89 3.34 3.52 3.7 3.88 4.06 3.47 3.66 3.85 4.04 4.23 19 20 21 22 23 24 25 3.7 3.85 4 4.15 4.3 4.45 4.6 3.88 4.04 4.2 4.36 4.52 4.68 4.84 4.06 4.23 4.4 4.57 4.74 4.91 5.08 4.24 4.42 4.6 4.78 4.96 5.14 5.32 4.42 4.61 4.8 4.99 5.18 5.37 5.56 26 27 28 4.75 4.9 5.05 5 5.16 5.32 5.25 5.42 5.59 5.5 5.68 5.86 5.75 5.94 6.13 29 30 31 32 5.2 5.35 5.5 5.65 5.48 5.64 5.8 5.96 5.76 5.93 6.1 6.27 6.04 6.22 6.4 6.58 6.32 6.51 6.7 6.89 33 34 35 36 37 38 39 5.8 5.95 6.1 6.25 6.4 6.55 6.7 6.12 6.28 6.44 6.6 6.76 6.92 7.08 6.44 6.61 6.78 6.95 7.12 7.29 7.46 6.76 6.94 7.12 7.3 7.48 7.66 7.84 7.08 7.27 7.46 7.65 7.84 8.03 8.22 40 41 42 6.85 7 7.15 7.24 7.4 7.56 7.63 7.8 7.97 8.02 8.2 8.38 8.41 8.6 8.79 57 N_bar 43 44 45 46 rho=0.15 7.3 7.45 7.6 7.75 rho= .16 7.72 7.88 8.04 8.2 rho=.17 8.14 8.31 8.48 8.65 rho=.18 8.56 8.74 8.92 9.1 rho=.19 8.98 9.17 9.36 9.55 47 48 49 50 51 52 53 7.9 8.05 8.2 8.35 8.5 8.65 8.8 8.36 8.52 8.68 8.84 9 9.16 9.32 8.82 8.99 9.16 9.33 9.5 9.67 9.84 9.28 9.46 9.64 9.82 10 10.18 10.36 9.74 9.93 10.12 10.31 10.5 10.69 10.88 54 55 8.95 9.1 9.48 9.64 10.01 10.18 10.54 10.72 11.07 11.26 56 57 58 59 60 9.25 9.4 9.55 9.7 9.85 9.8 9.96 10.12 10.28 10.44 10.35 10.52 10.69 10.86 11.03 10.9 11.08 11.26 11.44 11.62 11.45 11.64 11.83 12.02 12.21 61 62 63 64 65 66 67 10 10.15 10.3 10.45 10.6 10.75 10.9 10.6 10.76 10.92 11.08 11.24 11.4 11.56 11.2 11.37 11.54 11.71 11.88 12.05 12.22 11.8 11.98 12.16 12.34 12.52 12.7 12.88 12.4 12.59 12.78 12.97 13.16 13.35 13.54 68 69 70 11.05 11.2 11.35 11.72 11.88 12.04 12.39 12.56 12.73 13.06 13.24 13.42 13.73 13.92 14.11 71 72 73 74 11.5 11.65 11.8 11.95 12.2 12.36 12.52 12.68 12.9 13.07 13.24 13.41 13.6 13.78 13.96 14.14 14.3 14.49 14.68 14.87 75 76 77 78 79 80 81 12.1 12.25 12.4 12.55 12.7 12.85 13 12.84 13 13.16 13.32 13.48 13.64 13.8 13.58 13.75 13.92 14.09 14.26 14.43 14.6 14.32 14.5 14.68 14.86 15.04 15.22 15.4 15.06 15.25 15.44 15.63 15.82 16.01 16.2 82 83 84 13.15 13.3 13.45 13.96 14.12 14.28 14.77 14.94 15.11 15.58 15.76 15.94 16.39 16.58 16.77 58 N_bar 85 86 87 88 rho=0.15 13.6 13.75 13.9 14.05 rho= .16 14.44 14.6 14.76 14.92 rho=.17 15.28 15.45 15.62 15.79 rho=.18 16.12 16.3 16.48 16.66 rho=.19 16.96 17.15 17.34 17.53 89 90 91 92 93 94 95 14.2 14.35 14.5 14.65 14.8 14.95 15.1 15.08 15.24 15.4 15.56 15.72 15.88 16.04 15.96 16.13 16.3 16.47 16.64 16.81 16.98 16.84 17.02 17.2 17.38 17.56 17.74 17.92 17.72 17.91 18.1 18.29 18.48 18.67 18.86 96 97 15.25 15.4 16.2 16.36 17.15 17.32 18.1 18.28 19.05 19.24 98 99 100 15.55 15.7 15.85 16.52 16.68 16.84 17.49 17.66 17.83 18.46 18.64 18.82 19.43 19.62 19.81 59 N_bar 1 2 3 4 rho=.20 1 1.2 1.4 1.6 rho=.21 1 1.21 1.42 1.63 rho=.22 1 1.22 1.44 1.66 rho=.23 1 1.23 1.46 1.69 rho=.24 1 1.24 1.48 1.72 5 6 7 8 9 10 11 1.8 2 2.2 2.4 2.6 2.8 3 1.84 2.05 2.26 2.47 2.68 2.89 3.1 1.88 2.1 2.32 2.54 2.76 2.98 3.2 1.92 2.15 2.38 2.61 2.84 3.07 3.3 1.96 2.2 2.44 2.68 2.92 3.16 3.4 12 13 3.2 3.4 3.31 3.52 3.42 3.64 3.53 3.76 3.64 3.88 14 15 16 17 18 3.6 3.8 4 4.2 4.4 3.73 3.94 4.15 4.36 4.57 3.86 4.08 4.3 4.52 4.74 3.99 4.22 4.45 4.68 4.91 4.12 4.36 4.6 4.84 5.08 19 20 21 22 23 24 25 4.6 4.8 5 5.2 5.4 5.6 5.8 4.78 4.99 5.2 5.41 5.62 5.83 6.04 4.96 5.18 5.4 5.62 5.84 6.06 6.28 5.14 5.37 5.6 5.83 6.06 6.29 6.52 5.32 5.56 5.8 6.04 6.28 6.52 6.76 26 27 28 6 6.2 6.4 6.25 6.46 6.67 6.5 6.72 6.94 6.75 6.98 7.21 7 7.24 7.48 29 30 31 32 6.6 6.8 7 7.2 6.88 7.09 7.3 7.51 7.16 7.38 7.6 7.82 7.44 7.67 7.9 8.13 7.72 7.96 8.2 8.44 33 34 35 36 37 38 39 7.4 7.6 7.8 8 8.2 8.4 8.6 7.72 7.93 8.14 8.35 8.56 8.77 8.98 8.04 8.26 8.48 8.7 8.92 9.14 9.36 8.36 8.59 8.82 9.05 9.28 9.51 9.74 8.68 8.92 9.16 9.4 9.64 9.88 10.12 40 41 42 8.8 9 9.2 9.19 9.4 9.61 9.58 9.8 10.02 9.97 10.2 10.43 10.36 10.6 10.84 60 N_bar 43 44 45 46 rho=.20 9.4 9.6 9.8 10 rho=.21 9.82 10.03 10.24 10.45 rho=.22 10.24 10.46 10.68 10.9 rho=.23 10.66 10.89 11.12 11.35 rho=.24 11.08 11.32 11.56 11.8 47 48 49 50 51 52 53 10.2 10.4 10.6 10.8 11 11.2 11.4 10.66 10.87 11.08 11.29 11.5 11.71 11.92 11.12 11.34 11.56 11.78 12 12.22 12.44 11.58 11.81 12.04 12.27 12.5 12.73 12.96 12.04 12.28 12.52 12.76 13 13.24 13.48 54 55 11.6 11.8 12.13 12.34 12.66 12.88 13.19 13.42 13.72 13.96 56 57 58 59 60 12 12.2 12.4 12.6 12.8 12.55 12.76 12.97 13.18 13.39 13.1 13.32 13.54 13.76 13.98 13.65 13.88 14.11 14.34 14.57 14.2 14.44 14.68 14.92 15.16 61 62 63 64 65 66 67 13 13.2 13.4 13.6 13.8 14 14.2 13.6 13.81 14.02 14.23 14.44 14.65 14.86 14.2 14.42 14.64 14.86 15.08 15.3 15.52 14.8 15.03 15.26 15.49 15.72 15.95 16.18 15.4 15.64 15.88 16.12 16.36 16.6 16.84 68 69 70 14.4 14.6 14.8 15.07 15.28 15.49 15.74 15.96 16.18 16.41 16.64 16.87 17.08 17.32 17.56 71 72 73 74 15 15.2 15.4 15.6 15.7 15.91 16.12 16.33 16.4 16.62 16.84 17.06 17.1 17.33 17.56 17.79 17.8 18.04 18.28 18.52 75 76 77 78 79 80 81 15.8 16 16.2 16.4 16.6 16.8 17 16.54 16.75 16.96 17.17 17.38 17.59 17.8 17.28 17.5 17.72 17.94 18.16 18.38 18.6 18.02 18.25 18.48 18.71 18.94 19.17 19.4 18.76 19 19.24 19.48 19.72 19.96 20.2 82 83 84 17.2 17.4 17.6 18.01 18.22 18.43 18.82 19.04 19.26 19.63 19.86 20.09 20.44 20.68 20.92 61 N_bar 85 86 87 88 rho=.20 17.8 18 18.2 18.4 rho=.21 18.64 18.85 19.06 19.27 rho=.22 19.48 19.7 19.92 20.14 rho=.23 20.32 20.55 20.78 21.01 rho=.24 21.16 21.4 21.64 21.88 89 90 91 92 93 94 95 18.6 18.8 19 19.2 19.4 19.6 19.8 19.48 19.69 19.9 20.11 20.32 20.53 20.74 20.36 20.58 20.8 21.02 21.24 21.46 21.68 21.24 21.47 21.7 21.93 22.16 22.39 22.62 22.12 22.36 22.6 22.84 23.08 23.32 23.56 96 97 20 20.2 20.95 21.16 21.9 22.12 22.85 23.08 23.8 24.04 98 99 100 20.4 20.6 20.8 21.37 21.58 21.79 22.34 22.56 22.78 23.31 23.54 23.77 24.28 24.52 24.76 62 BIBLIOGRAPHY Abdeljaber, M. H., A. S. Monto, et al. (1991). "The Impact of Vitamin-A Supplementation on Morbidity - A Randomized Community Intervention Trial." American Journal of Public Health 81(12), 1654-1656. Alexander, F., M. M. Roberts, et al. (1989). "Randomization By Cluster and the Problem of Social-Class Bias." Journal of Epidemiology and Community Health 43(1), 29-36. Arceneaux, K. (2005). "Using cluster randomized field experiments to study voting behavior." Annals of the American Academy of Political and Social Science 601, 169-179. Berg, D. T. (2010). The Fear of Terrorist Attacks in the Southwestern United States: A Cross Sectional Analysis. PhD, Arizona State. Campbell, D. T. S., Julian C. (1963). Experimental and Quasi-Experimental Designs for Research. Boston, Houghton Mifflin Company. Campbell, M. J. (2000). "Cluster randomized trials in general (family) practice research." Statistical Methods in Medical Research 9(2), 81-94. Campbell, M. J., A. Donner, et al. (2007). "Developments in cluster randomized trials and Statistics in Medicine." Statistics in Medicine 26(1), 2-19. Cochran, W. G. (1977). Sampling Techniques. New York, John Wiley & Sons: 428. 63 Coghlan, B., P. M. Ngoy, et al. (2009). "Update on Mortality in the Democratic Republic of Congo: Results From a Third Nationwide Survey." Disaster Medicine & Public Health Preparedness 3(2), 8. Cornfield, J. (1978). "Randomization by group: A formal analysis." American journal of epidemiology 108, 100-102. Cox, R. G., L. Zhang, et al. (2007). "Academic performance and substance use: Findings from a state survey of public high school students." Journal of School Health 77(3), 109-115. Donald, A. and A. Donner (1997). "Adjustments to the Mantel-Haenszel chi-square statistic and odds ration variance estimator when the data are clustered (vol 6, pg 491, 1987)." Statistics in Medicine 16(24), 2927-2927. Donner (1990). "A Methodological Review of Non-Therapeutic Intervention Trials Employing Cluster Randomization." International journal of epidemiology 19(4), 5. Donner, A. (1998). "Some aspects of the design and analysis of cluster randomization trials." Journal of the Royal Statistical Society Series C-Applied Statistics 47, 95-113. Donner, A., N. Birkett, et al. (1981). "Randomization By Cluster - Sample-Size Requirements and Analysis." American Journal of Epidemiology 114(6), 906-914. Donner, A. and N. Klar (2000). Design and Analysis of Cluster Randomization Trials in Health Research. London, Arnold. 64 Donner, A. and N. Klar (2002). Issues in the meta-analysis of cluster randomized trials, John Wiley & Sons Ltd. Donner, A. and N. Klar (2004). "Pitfalls of and controversies in cluster randomization trials." American Journal of Public Health 94(3), 416-422. Eldridge, S., D. Ashby, et al. (2008). "Internal and external validity of cluster randomised trials: systematic review of recent trials." British Medical Journal 336(7649), 876-880. Falco, D. (2008). Assesing Students’ Views Towards Punishment:A Comparison Of Punitiveness Among Criminology And Non-Criminology Students. PhD, Indiana University, Pennsylvania. Feng, Z. D., P. Diehr, et al. (1999). "Explaining community-level variance in group randomized trials." Statistics in Medicine 18(5), 539-556. Glass, G., V. and K. D. Hopkins (1984). Statistical Methods in Education and Psychology. Boston, Allyn & Bacon. Guo, J., R. Whittemore, et al. (2009). "Factors that influence health quotient in Chinese college undergraduates." Journal of Clinical Nursing 19, 10. Hansen, M. H., W. N. Hurwitz, et al. (1953). Sample Survey and Methods. New York, John Wiley and Sons. Hansen MH, H. W. (1942). "Relative efficiencies of various sampling units in population inquiries." Journal of the American Statistical Association 37, 89-94. 65 Hayes, R. J., N. D. E. Alexander, et al. (2000). "Design and analysis issues in clusterrandomized trials of interventions against infectious diseases." Statistical Methods in Medical Research 9(2), 95-116. Hedges, L. V. and E. C. Hedberg (2007). "Intraclass correlation values for planning grouprandomized trials in education." Educational Evaluation and Policy Analysis 29(1), 6087. Hsieh, F. Y. (1988). "Sample-Size Formulas For Intervention Studies With The Cluster As Unit of Randomization." Statistics in Medicine 7(11), 1195-1201. Hsiu-Lan, T. (1997). The Vocational Interest Structure of Tawianese High School Students. A. P. Association. Ping Tung City, National Pingtung Teachers College: 19. Isaakidis, P. and J. P. A. Ioannidis (2003). "Evaluation of cluster randomized controlled trials in sub-Saharan Africa." American Journal of Epidemiology 158(9), 921-926. Johnson, N. F., M. Spagat, et al. (2008). "Bias in epidemiological studies of conflict mortality." Journal of Peace Research 45(5), 653-663. Kazumune, U., Fujiwara Takeo, et al. (2010). "Does Social Capital Promote Physical Activity? A Population-Based Study in Japan." PLOS One 5(8), 6. Kish, L. (1965). Survey Sampling. New York, John Wiley and Sons. Klar, N. and A. Donner (1997). "The merits of matching in community intervention trials: A cautionary tale." Statistics in Medicine 16(15), 1753-1764. 66 Lewsey, J. D. (2004). "Comparing completely and stratified randomized designs in cluster randomized trials when the stratifying factor is cluster size: a simulation study." Statistics in Medicine 23(6), 897-905. Lindquist, E. F. (1940). Statistical analysis in educational research. Boston, Houghton Mifflin Company. Micceri, T. (1989). "The Unicorn, The Normal Curve, and Other Improbable Creatures." Psychological Bulletin 105(1), 156-166. Michigan Governmental Website (2010). Retrieved July 21, 2010, from http://www.michigan.gov/cgi/0,1607,7-158-54534-240589-,00.html.Montana Office of Public Education (2007). OPI Report. Retrieved June 22, 2010, from http://www.opi.mt.gov/PDF/Measurement/rptDistCrtResults2007.pdf New York Governmental Website (2009), Department of Labor. Retrieved June 22, 2010, from http://www.labor.ny.gov/stats/nys/statewide_population_data.asp. New York State Department of Education (2010). Retrieved June 22, 2010, from http://www.emsc.nysed.gov/irts/ela-math/. Neyman, J. (1935). "Statistical problems in agricultural experimentation." Journal of the Royal Statistical Society Supplement 2, 19. Olowa, O. W. (2009). "Effects of the Problem Solving and Subject Matter Approaches on the Problem Solving Ability of Secondary School Agricultural Education." Journal of Industrial Teacher Education 46(1), 15. 67 Perry, C. L., D. E. Sellers, et al. (1997). "The child and adolescent trial for cardiovascular health (CATCH): Intervention, implementation, and feasability for elementary schools in the United States." Health Education & Behavior 24(6), 716-735. Puffer, S., D. J. Torgerson, et al. (2003). "Evidence for risk of bias in cluster randomised trials: review of recent trials published in three general medical journals." British Medical Journal 327(7418), 785-787. Rao, J. N. K. and A. J. Scott (1992). "A Simple Method For The Analysis Of Clustered Binary Data." Biometrics 48(2), 577-585. Rao, P. S. R. S. (2000). Sampling Methodologies with Applications. New York, Chapman and Hall/CRC. Raudenbush, S. W. (1997). "Statistical analysis and optimal design for cluster randomized trials." Psychological Methods 2(2), 173-185. Raudenbush, S. W., A. Martinez, et al. (2007). "Strategies for improving precision in grouprandomized experiments." Educational Evaluation and Policy Analysis 29(1), 5-29. Rooney, B. L. and D. M. Murray (1996). "A meta-analysis of smoking prevention programs after adjustment for errors in the unit of analysis." Health Education Quarterly 23(1), 4864. Royce, J. M., N. Hymowitz, et al. (1993). "Smoking cessation factors among africanamericans and whites ." American Journal of Public Health 83(2), 220-226. 68 Runyon, R., K. Coleman, et al. (2000). Fundamentals of Behavioral Statistics, Mcgraw-Hill. Sawilowsky, S. S. (2007). Real Data Analysis, Information Age Pub Inc. Sawilowsky, S. S. and G. G. Fahoome (2003). Statistics Through Monte Carlo Simulation With Fortran. Oak Park, Mi., JMASM. Schulz, K. F. and D. A. Grimes (2002). "Blinding in randomised trials: hiding who got what.” Lancet 359(9307), 696-700. Simpson, J. M., N. Klar, et al. (1995). "Accounting for cluster randomization - A review of primary prevention trials, 1990 through 1993." American Journal of Public Health 85(10), 1378-1383. Sudman, S. (1976). Applied Sampling. New York, Academic Press. Tassitano, R., M. Barros, et al. (2010). "Enrollment in hysical Education Is Associated With Health-Related Behavior Among High School Students." Journal of School Health 80(3), 8. Upton, G. J. G. (1978). The Analysis of Cross Tabulation Data. Chister, New York, Brisbane, Toronto, John Wiley and Sons. Wang, L. and X. Fan (1997). The Effect of Cluster Sampling Design in Survey Research on the Standard Error Statistic. Chicago, American Educational Research Association: 17. Wiesberg, H. F., J. A. Krosnick, et al. (1996). An introduction to survey research, polling, and data analysis. Thousand Oaks, Sage Publications. 69 Yang, L. T. L. a. L. (2008). "Duration of Sleep and ADHD Tendency Among Adolescents in China." Journal of Attention Disorders 11, 9. 70 ABSTRACT THE EFFECT OF NONRANDOM SELECTION OF CLUSTERS IN A TWO STAGE CLUSTER DESIGN by JASON PARROTT December 2012 Advisor: Dr. Shlomo Sawilowsky Major: Evaluation and Research Degree: Doctor of Philosophy Although not as efficient as simple random sampling, cluster sampling has been regarded as a valid sampling technique when the researcher is attempting to save cost. However, in order to do so, it is necessary that random selection occurs in all stages of sampling. This simulation study examines purposeful selection of cluster sampling in the second stage of a two stage cluster design. Using Monte Carlo methods, a simulation was conducted comparing the random selection of both stages of a two stage cluster sample to purposeful selection of the first stage of a two stage cluster sample. The study compares purposeful selection to random selection by examining the width of the confidence intervals that are returned for each simulation. After conducting the study, it was evident that using purposeful selection can yield a confidence interval up to nine times greater than that of its random counterpart. 71 AUTOBIOGRAPHICAL STATEMENT My relatively short life of 34 years has been spent immersed in education. I was born and raised in Royal Oak, Michigan where I attended Royal Oak Schools. I graduated from Royal Oak Kimball High School in 1996, and attended Central Michigan University the following year. At Central Michigan University, I earned undergraduate degrees in the areas of political science and special education. After graduating from Central Michigan University, I became a special education teacher at Lakeland High School in White Lake, Michigan. While teaching at Lakeland, I continued my educational career enrolling at Wayne State University, earning a Masters Degree in the area of Educational Leadership. After I earned my Masters Degree, I enrolled in the research and evaluation program at Wayne State University and will graduate with the completion of my dissertation. While completing my doctoral program, I began to pursue a career in the field of educational administration. I started as the Dean of Students at Lakeland High School, transitioned to middle school Assistant Principal at Royal Oak Middle School, and am currently the Principal of Oak Ridge Elementary, in Royal Oak, Michigan. I have the unique pleasure of being the Principal of the elementary school I attended as a child. Personally, I have been married for ten years. My wife Laci and I have three children ages 6, 4, and 2. Laci, Isabela, Olivia, and Maddox are my inspiration and I thank them for their patience and encouragement during this process. My doctoral program has been a unique experience and I am looking forward to applying what I have learned in this process to the next stage of my life.
© Copyright 2026 Paperzz