Running Head: G*Power 3 G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences (in press). Behavior Research Methods. Franz Faul Christian-Albrechts-Universität Kiel Kiel, Germany Edgar Erdfelder Universität Mannheim Mannheim, Germany Albert-Georg Lang and Axel Buchner Heinrich-Heine-Universität Düsseldorf Düsseldorf, Germany Please send correspondence to: Prof. Dr. Edgar Erdfelder Lehrstuhl für Psychologie III Universität Mannheim Schloss Ehrenhof Ost 255 D-68131 Mannheim, Germany Email: [email protected] Phone +49 621 / 181 – 2146 Fax: + 49 621 / 181 - 3997 G*Power 3 (BSC702) Page 2 Abstract G*Power (Erdfelder, Faul, & Buchner, Behavior Research Methods, Instruments, & Computers, 1996) was designed as a general stand-alone power analysis program for statistical tests commonly used in social and behavioral research. G*Power 3 is a major extension of, and improvement over, the previous versions. It runs on widely used computer platforms (Windows XP, Windows Vista, Mac OS X 10.4) and covers many different statistical tests of the t-, F-, and !2-test families. In addition, it includes power analyses for z tests and some exact tests. G*Power 3 provides improved effect size calculators and graphic options, it supports both a distribution-based and a design-based input mode, and it offers all types of power analyses users might be interested in. Like its predecessors, G*Power 3 is free. G*Power 3 (BSC702) Page 3 G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences Statistics textbooks in the social, behavioral, and biomedical sciences typically stress the importance of power analyses. By definition, the power of a statistical test is the probability of rejecting its null hypothesis given that it is in fact false. Obviously, significance tests lacking statistical power are of limited use because they cannot reliably discriminate between the null hypothesis (H0) and the alternative hypothesis (H1) of interest. However, although power analyses are indispensable for rational statistical decisions, it took until the late 1980s until power charts (e.g., Scheffé, 1959) and power tables (e.g., Cohen, 1988) were supplemented by more efficient, precise, and easy-to-use power analysis programs for personal computers (Goldstein, 1989). G*Power 2 (Erdfelder, Faul, & Buchner, 1996) can be seen as a second-generation power analysis program designed as a stand-alone application to handle several types of statistical tests commonly used in social and behavioral research. In the past ten years, this program has been found useful not only in the social and behavioral sciences but also in many other disciplines that routinely apply statistical tests, for example, biology (Baeza & Stotz, 2003), genetics (Akkad et al., 2006), ecology (Sheppard, 1999), forest- and wildlife research (Mellina, Hinch, Donaldson, & Pearson, 2005), the geosciences (Busbey, 1999), pharmacology (Quednow et al., 2004), and medical research (Gleissner, Clusmann, Sassen, Elger, & Helmstaedter, 2006). G*Power 2 was evaluated positively in the reviews we are aware of (Kornbrot, 1997; Ortseifen, Bruckner, Burke, & Kieser, 1997; Thomas & Krebs, 1997), and it has been used in several power tutorials (e.g., Buchner, Erdfelder, & Faul, 1996, 1997; Erdfelder, Buchner, Faul & Brandt, 2004; Levin, 1997; Sheppard, 1999) as well as in statistics textbooks (e.g., Field, 2005; Keppel & Wickens, 2004; Myers & Well, 2003; Rasch, Friese, Hofmann, & Naumann, 2006a, 2006b). Nevertheless, the user feedback that we received converged with our own experience in showing some limitations and weaknesses of G*Power 2 that required a major extension and revision. The present article describes G*Power 3, a program that was designed to address the problems of G*Power 2. We will first outline the major improvements in G*Power 3 (Section 1) before we discuss the types of power analyses covered by this program (Section 2). Next, we G*Power 3 (BSC702) Page 4 describe program handling (Section 3) and the types of statistical tests to which it can be applied (Section 4). The fifth section is devoted to the statistical algorithms of G*Power 3 and their accuracy. Finally, program availability and some internet resources supporting users of G*Power 3 are described in Section 6. 1) Improvements in G*Power 3 compared to G*Power 2 G*Power 3 is an improvement over G*Power 2 in five major aspects. First, whereas G*Power 2 requires the DOS and Mac-OS 7-9 operating systems that were common in the 1990s but are now outdated, G*Power 3 runs on the currently most widely used personal computer platforms, that is, Windows XP, Windows Vista, and Mac OS X 10.4. The Windows and the Mac versions of the program are essentially equivalent. They use the same computational routines and share very similar user interfaces. For this reason, we will not differentiate between both versions in the sequel; users simply have to make sure to download the version appropriate for their operating system. Second, whereas G*Power 2 is limited to three types of power analyses, G*Power 3 supports five different ways to assess statistical power. In addition to a priori analyses, post hoc analyses, and compromise power analyses that were already covered by G*Power 2, the new program also offers sensitivity analyses and criterion analyses. Third, G*Power 3 provides dedicated power analysis options for a variety of frequentlyused t tests, F tests, z tests, !2 tests, and exact tests, rather than just the standard tests covered by G*Power 2. The tests captured by G*Power 3 are described in Section 3 along with their effect size parameters. Importantly, users are not limited to these tests because G*Power 3 also offers power analyses for generic t-, F-, z-, !2, and binomial tests for which the noncentrality parameter of the distribution under H1 may be entered directly. In this way, users are provided with a flexible tool for computing the power of basically any statistical test that uses t-, F-, z-, !2, or binomial reference distributions. Fourth, statistical tests can be specified in G*Power 3 using two different approaches, the distribution-based approach and the design-based approach. In the distribution-based approach, G*Power 3 (BSC702) Page 5 users select (a) the family of the test statistic (i.e., t-, F-, z-, !2, or exact test) and (b) the particular test within this family. This is the way in which power analyses were specified in G*Power 2. Additionally, a separate menu in G*Power 3 provides access to power analyses via the designbased approach: Users select (a) the parameter class the statistical test refers to (i.e., correlations, means, proportions, regression coefficients, variances) and (b) the design of the study (e.g., number of groups, independent vs. dependent samples, etc.). Based on the feedback we received to G*Power 2, we expect that some users might find the design-based input mode more intuitive and easier to use. Fifth, G*Power 3 supports users with enhanced graphics features. The details of these features will be outlined in Section 3, along with a description of program handling. 2) Types of Statistical Power Analyses The power (1-") of a statistical test is the complement of ", which denotes the type-2 or beta error probability of falsely retaining an incorrect H0. Statistical power depends on three classes of parameters: (1) the significance level (or, synonymously, the type-1 error probability) # of the test, (2) the size(s) of the sample(s) used for the test, and (3) an effect size parameter defining H1 and thus indexing the degree of deviation from H0 in the underlying population. Depending on the available resources, the actual phase of the research process, and the specific research question, five different types of power analysis can be reasonable (cf. Erdfelder et al., 2004; Erdfelder, Faul, & Buchner, 2005). We describe these methods and their uses in turn. 1) In a priori power analyses (Cohen, 1988), the sample size N is computed as a function of the required power level (1-"), the pre-specified significance level #, and the population effect size to be detected with probability (1-"). A priori analyses provide an efficient method of controlling statistical power before a study is actually conducted (e.g., Bredenkamp, 1969; Hager, 2006) and can be recommended whenever resources such as time and money required for data collection are not critical. 2) In contrast, post hoc power analyses (Cohen, 1988) often make sense after a study has already been conducted. In post hoc analyses, the power (1-") is computed as a function of #, the G*Power 3 (BSC702) Page 6 population effect size parameter, and the sample size(s) used in a study. It thus becomes possible to assess whether a published statistical test in fact had a fair chance to reject an incorrect H0. Importantly, post-hoc analyses, like a priori analyses, require an H1 effect size specification for the underlying population. They should not be confused with so-called retrospective power analyses in which the effect size is estimated from sample data and used to calculate the “observed power”, a sample estimate of the true power1. Retrospective power analyses are based on the highly questionable assumption that the sample effect size is essentially identical to the effect size in the population from which it was drawn (Zumbo & Hubley, 1998). Obviously, this assumption is likely to be false, the more so the smaller the sample. In addition, sample effect sizes are typically biased estimates of their population counterparts (Richardson, 1996). For these reasons, we agree with other critics of retrospective power analyses (e.g., Gerard, Smith & Weerakkody, 1998; Hoenig & Heisey, 2001; Kromrey & Hogarty, 2000; Lenth, 2001; Steidl, Hayes, & Schauber, 1997). Rather than using retrospective power analyses, researchers should specify population effect sizes on a priori grounds. Effect size specification simply means to define the minimum degree of violation of H0 a researcher would like to detect with a probability not less than (1-"). Cohen’s (1988) definitions of “small”, “medium”, and “large” effects can be helpful in such effect size specifications (see, e.g., Smith & Bayen, 2005). However, researchers should be aware of the fact that these conventions may have different meanings for different tests (cf. Erdfelder et al., 2005). (3) In compromise power analyses (Erdfelder, 1984; Erdfelder et al., 1996; Müller, Manz, & Hoyer, 2002), both # and 1-" are computed as functions of the effect size, N, and an error probability ratio q = " /#. To illustrate, q =1 would mean that the researcher prefers balanced type1 and type-2 error risks (# = "), whereas q = 4 would imply that " = 4 · # (cf. Cohen, 1988). Compromise power analyses can be useful both before and after data collection. For example, an a priori power analysis might result in a sample size that exceeds the available resources. In such a situation, a researcher could specify the maximum affordable sample size and, using a compromise power analysis, compute # and (1-") associated with, say, q! = " /# = 4. Alternatively, if a study has already been conducted but not yet been analyzed, a researcher could ask for a reasonable decision criterion that guarantees perfectly balanced error risks (i.e. # = "), given the size of this G*Power 3 (BSC702) Page 7 sample and a critical effect size she is interested in. Of course, compromise power analyses can easily result in unconventional significance levels larger than # = .05 (in case of small samples or effect sizes) or less than # = .001 (in case of large samples or effect sizes). However, we believe that the benefit of balanced type-1 and type-2 error risks often offsets the costs of violating significance level conventions (cf. Gigerenzer, Kraus, & Vitouch, 2004). (4) In sensitivity analyses the critical population effect size is computed as a function of #, 1-", and N. Sensitivity analyses may be particularly useful for evaluating published research. They provide answers to questions like “What is the effect size a study was able to detect with a power of 1-" = .80, given its sample size and # as specified by the author? In other words, what is the minimum effect size the test was sufficiently sensitive to?” In addition, sensitivity analyses may be useful before conducting a study to see whether, given a limited N, the size of the effect that can be detected is at all realistic (or, for instance, way too large to be expected realistically). (5) Finally, criterion analyses compute # (and the associated decision criterion) as a function of 1-", the effect size, and a given sample size. Criterion analyses are alternatives to posthoc power analyses after a study has already been conducted. They may be reasonable whenever the control of # is less important than the control of ". In case of goodness-of-fit tests for statistical models, for example, it is most important to minimize the "-risk of wrong decisions in favor of the model (H0). Researchers could thus use criterion analyses to compute the significance level # compatible with " = .05 for a small effect size. Whereas G*Power 2 was limited to the first three types of power analysis, G*Power 3 now covers all five types. Based on the feedback we received from G*Power 2 users, we believe that any question related to statistical power that occurs in research practice can be translated into one of these analysis types. G*Power 3 (BSC702) Page 8 3) Program Handling Using G*Power 3 typically involves the following four steps: (1) Select the statistical test appropriate for your problem, (2) choose one of the five types of power analysis defined in the previous section, (3) provide the input parameters required for the analysis, and (4) click on “calculate” to obtain the results. In the first step, the statistical test is chosen using the distribution-based or the design-based approach. G*Power 2 users probably have adapted to the distribution-based approach: One first selects the family of the test statistic (i.e., t-, F-, z-, !2, or exact test) using the “Test family” menu in the main window. The “Statistical test” menu adapts accordingly, showing a list of all tests available for the test family. For the two groups t test, for example, one would first select the t family of distributions and then “Means: Differences between two independent means (two groups)” in the “Statistical test” menu (see Figure 1). Alternatively, one might use the designbased approach of test selection. With the “Tests” pull-down menu in the top row it is possible to select (a) the parameter class the statistical test refers to (i.e., correlations, means, proportions, regression coefficients, variances) and (b) the design of the study (e.g., number of groups, independent vs. dependent samples, etc.). For example, researchers would select “Means” ! “Two independent groups” to specify the two-groups t test (see Figure 2). The design-based approach has the advantage that test options referring to the same parameter class (e.g., means) are located in close proximity, whereas they may be scattered across different distribution families in the distribution-based approach. Please insert Figures 1 and 2 about here. In the second step the “Type of power analysis” menu in the center of the main window should be used to choose the appropriate analysis type. In the third step, the power analysis input parameters are specified in the lower left of the main window. To illustrate, an a priori power analysis for a two groups t test would require a decision between a one-tailed and a two-tailed test, G*Power 3 (BSC702) Page 9 a specification of Cohen’s (1988) effect size measure d under H1, the significance level #, the required power (1-") of the test, and the preferred group size allocation ratio n2/n1. The final step consists of clicking “Calculate” to obtain the output in the lower right of the main window. For instance, input parameters specifying a one-tailed t test, a medium effect size of d = .5, # = .05, (1-") = .95, and an allocation ratio of n2/n1 = 1 would result in a total sample size of N=176 (88 observation units in each group; see Figures 1 and 2). The noncentrality parameter $ defining the t distribution under H1, the decision criterion to be used (i.e., the critical value of the t statistic), the degrees of freedom2 of the t test and the actual power value are also displayed. Note that the actual power will often be slightly larger than the pre-specified power in a priori power analyses. The reason is that non-integer sample sizes are always rounded up by G*Power to obtain integer values consistent with a power level not less than the pre-specified one. In addition to the numerical output, G*Power 3 displays the central (H0) and the noncentral (H1) test statistic distributions along with the decision criterion and the associated error probabilities in the upper part of the main window (see Figure 1)3. This supports understanding the effects of the input parameters and is likely to be a useful visualization tool in the teaching of, or the learning about, inferential statistics. The distributions plot may be printed, saved, or copied by clicking the right mouse button inside the plot area. The input and output of each power calculation in a G*Power session is automatically written to a protocol that can be displayed by selecting the “Protocol of power analyses” tab in the main window. It is possible to clear the protocol, or to print, save, and copy the protocol in the same way as the distributions plot. Because Cohen’s (1988) book on power analysis appears to be well known in the social and behavioral sciences, we made use of his effect size measures whenever possible. Researchers not familiar with these measures and users preferring to compute Cohen’s measures from more basic parameters can click on the “Determine” button to the left the effect size input field (see Figures 1 and 2). A drawer will open next to the main window and provide access to an effect size calculator tailored to the selected test (see Figure 2). For the two-groups t test, for example, users can specify the means (µ1, µ2) and the common standard deviation (%) in the populations underlying the groups G*Power 3 (BSC702) Page 10 to calculate Cohen’s d = |µ1-µ2|/%. Clicking the “Calculate and transfer to main window” button copies the computed effect size to the appropriate field in the main window. Please insert Figure 3 about here. Another useful option is the Power Plot window (see Figure 3) which is opened by clicking the “X-Y plot for a range of values” in the lower right corner of the main window (see Figures 1 and 2). By selecting the appropriate parameters for the y- and the x-axis, one parameter (#, power (1–"), effect size, or sample size) can be plotted as a function of any other parameter. Of the remaining two parameters, one can be chosen to draw a family of graphs, while the fourth parameter is kept constant. For instance, power (1–") can be drawn as a function of the sample size for several different population effects sizes, keeping # at a particular value. The plot may be printed, saved, or copied by clicking the right mouse button inside the plot area. Selecting the “table” tab reveals the data underlying the plot; they may be copied to other applications. The Power Plot window inherits all input parameters of the analysis that is active when the “X-Y plot for a range of values” button is pressed. Only some of these parameters can be directly manipulated in the Power Plot window. For instance, switching from a plot of a two-tailed test to that of a one-tailed test requires choosing the “Tail(s): One” option in the main window, followed by pressing the “X-Y plot for a range of values” button. 4) Types of Statistical Tests. G*Power 3 provides power analyses for test statistics following t-, F-, !2- or standard normal distributions under H0 (either exact or asymptotic) and noncentral distributions of the same test families under H1. In addition, it includes power analyses for some exact tests. In Tables 2 to 9 we briefly describe the tests currently covered by G*Power 3. Table 1 lists the symbols and their meanings as used in Tables 2 to 9. G*Power 3 (BSC702) Page 11 Tests for Correlation and Regression Table 2 summarizes the procedures supported for testing hypotheses on correlation and regression. One-sample tests are provided for the point-biserial model4, i.e. correlations between a binary variable and a continuous variable, and for correlations between two normally distributed variables (Cohen, 1988, Chapter 3). The latter test uses the exact sample correlation coefficient distribution (Barabesi & Greco, 2002) or, optionally, a large sample approximation based on Fisher’s r-to-z transformation. The two-sample test for differences between two correlations uses Cohen’s effect size q (Cohen, 1988, Chapter 4) and is based on Fisher’s r-to-z transformation. Cohen defines q = .10, q = .30, and q = .50 as “small”, ”medium”, and ”large” effects, respectively. The two procedures available for the multiple regression model handle the cases of (a) a test of an overall effect, i.e. the hypothesis that the population value of R2 is different from zero, and (b) a test of the hypothesis that adding additional predictors increases the value of R2 (Cohen, 1988, Chapter 9). According to Cohen’s criteria effects of size f 2 = 0.02, f 2 = 0.15, and f 2 = 0.35 are considered “small”, ”medium”, and ”large”, respectively. Tests for Means (univariate case) Table 3 summarizes the power analysis procedures for tests on means. G*Power 3 supports all cases of the t test for means described by Cohen (1988, Chapter 2): the test for independent means, the test of the null hypothesis that the population mean equals some specified value (one sample case), and the test on the means of two dependent samples (matched pairs). Cohen’s d and dz are used as effect size indices. Cohen defines d = 0.2, d = 0.5, and d = 0.8 as “small”, ”medium”, and ”large” effects, respectively. Effect size dialogs are available to compute the appropriate effect size parameter from means and standard deviations. For example, assume we want to compare visual search times for targets embedded in rare versus frequent local contexts in a within-subject design (cf. Hoffmann & Sebald, 2005, Exp. 1). It is expected that the mean search time for targets in rare contexts (say, 600 ms) should decrease by at least 10 ms in frequent contexts (i.e., to 590 ms) as a consequence of local contextual cuing. If prior evidence suggests population standard deviations G*Power 3 (BSC702) Page 12 of, say, % = 25 ms in each of the conditions and a correlation of & = .70 between search times in both conditions we can use the effect size drawer of G*Power 3 for the matched pairs t test to calculate the effect size dz = .516 (see Table 3, row 2, for the formula). By selecting a post hoc power analysis for one-tailed matched pairs t tests, we easily see that for dz = .516, # = .05, and N = 16 participants the power is only 1-" = .47. Thus, provided that the above assumptions are appropriate, the nonsignificant statistic t(15) = 1.475 obtained by Hoffmann and Sebald (2005, Exp. 1, p. 34) might in fact be due to a type 2 error. This interpretation would be consistent with the fact that Hoffmann and Sebald (2005) observed significant local contextual cuing effects in all other four experiments they reported. The procedures provided by G*Power 3 to test effects in between-subjects designs with more than two groups (i.e., one-way ANOVA designs and general main effects and interactions in factorial ANOVA designs of any order) are identical to those in G*Power 2 (Erdfelder et al., 1996). In all these cases the effect size f as defined in Cohen (1988) is used. In a one-way ANOVA the effect size drawer can be used to compute f from the means and group sizes of k groups and a standard deviation common to all groups. For tests of effects in factorial designs, the effect size drawer offers the possibility to compute the effect size f from the variance explained by the tested effect and the error variance. Cohen defines f = 0.1, f = 0.25, and f = 0.4 as “small”, ”medium”, and ”large” effects, respectively. New in G*Power 3 are procedures for analyzing main effects and interactions for A ! B mixed designs, where A is a between-subjects factor (or an enumeration of the groups generated by cross-classification of several between-subject factors) and B is a within-subjects factor (or an enumeration of the repeated measures generated by cross-classification of several within-subject factors). Both the univariate and the multivariate approach to repeated measures (O’Brien & Kaiser, 1985) are supported. The multivariate approach will be discussed below. The univariate approach is based on the sphericity assumption. This assumption is correct if (in the population) all variances of the repeated measurements are equal and all correlations between pairs of repeated measurements are equal. If all the distributional assumptions are met, then the univariate approach is the most powerful method (Muller & Barton, 1989; O’Brien & Kaiser, 1985). Unfortunately, especially the assumption of equal correlations is often violated, which can lead to very misleading G*Power 3 (BSC702) Page 13 results. In order to compensate for such adverse effects in tests of within effects or between-within interactions, the noncentrality parameter and the degrees of freedom of the F-distribution can be multiplied by a correction factor ' (Geisser & Greenhouse, 1958; Huynh & Feldt, 1970). ' is 1 if the sphericity assumption is met and approaches 1/(m-1) with increasing degrees of violation of sphericity, where m denotes the number of repeated measurements. G*Power provides three separate yet very similar routines to calculate power in the univariate approach for between effects, within effects, and interactions. If the to-be-detected effect size f is known, these procedures are very easy to apply. To illustrate, Berti, Münzer, Schröger, and Pechmann (2006) compared the pitch discrimination ability of 10 musicians and 10 control subjects (between-subjects factor A) for 10 different interference conditions (within-subject factor B). Assuming that A, B, and A ! B effects of “medium” size (f = .25, cf. Cohen, 1988; Table 3) should be detected given a correlation of & = .50 between repeated measures and a significance level of # = .05, the power values of the F tests for the A main effect, the B main effect, and the A ! B interaction are easily computed as .30, .95, and .95, respectively, by inserting f = .25, # = .05, the total sample size (20), the number of groups (2), the number of repetitions (10) and & = .50 into the appropriate input fields of the procedures designed for these tests. If the to-be-detected effect size f is unknown, it is necessary to compute f from more basic parameters characterizing the expected population scenario under H1. To demonstrate the general procedure we will show how to do post-hoc power analyses in the scenario illustrated in Figure 4 assuming the variance and correlations structure defined in matrix SR1. We first consider the power of the within effect: We select the “F tests” family, the “ANOVA: Repeated measures, within factors” test, and “Post hoc” as the type of power analysis. Both the “Number of groups” and the “Repetitions” fields are set to 3. The “Total sample size” is set to 90 and the “# error probability” to 0.05. Referring to matrix SR1, we insert 0.3 in the input field “corr among rep measures”, and -- since sphericity obviously holds in this case -- we set the “nonsphericity correction '” to 1. To determine the effect size f, we first calculate ! µ2 , the variance of the within effect. From the three column means µ • j of matrix M and the grand mean µ •• we get ! µ2 = ((1012.889)2 + (13-12.889)2 + (15.667-12.889)2)/3 = 5.35679. Clicking the “Determine” button next to the effect size label opens the effect size drawer. We choose “From variances” and set the G*Power 3 (BSC702) Page 14 “Variance explained by special effect” to 5.357 and the “Variance within groups” to 92 = 81. Clicking the “Calculate and transfer to main window” button calculates an effect size f = 0.2572 and transfers f to the effect size field in the main window. Clicking “Calculate” yields the results: The power is 0.997, the critical F value with df1=2 and df2=174 is 3.048, and the noncentrality parameter ( is 25.52. The procedure for tests of between-within interactions effects (“ANOVA: Repeated measures, within-between interaction”) is almost identical to that just described. The only difference is in how the effect size f is computed: Here we first calculate the variance of the residual values µ ij ! µ i• ! µ • j + µ •• of matrix M: ! µ2 = ((10-10-15+12.889)2 + … + (12-15.66711.333 + 12.889)2)/9.0 = 1.90123. Using the effect size drawer in the same way as above we get an effect size f = 0.1532 which results in a power of 0.653. To test between effects we choose “ANOVA: Repeated measures, between factors” and set all parameters to the same values as before. Note that in this case we do not need to specify '—no correction is necessary because tests of between factors do not require the sphericity assumption. To calculate the effect size, we use “Effect size from means” in the effect size drawer. We select 3 groups, set “SD % within each group” to 9, and insert for each group the corresponding row mean µ i• of M (15,12.3333, 11.3333) and an equal group size of 30. An effect size f = 0.1719571 is calculated and the resulting power is 0.488. Note that G*Power 3 can easily handle pure repeated measures designs without any between-subject factors (e.g., Frings & Wentura, 2005; Schwarz & Müller, 2006) by choosing the “ANOVA: Repeated Measures, within factors” procedure and setting the number of groups to 1. Tests for Means (multivariate case) G*Power 3 contains several procedures for performing power analyses in multivariate designs (see Table 3). All these tests belong to the F test family. The Hotelling T2 tests are extensions of univariate t tests to the multivariate case, where more than one dependent variable is measured: Instead of two single means two mean vectors are compared, and instead of a single variance, a variance-covariance matrix is considered (Rencher, 1998). In the one-sample case H0 posits that the vector of population means is identical to a G*Power 3 (BSC702) Page 15 specified constant mean vector. The effect size drawer can be used to calculate the effect size r r ) from the difference µ ! c and the expected variance-covariance matrix under H1. For example, r r assume that we have two variables, a difference vector µ ! c = {1.88, 1.88} under H1, variances ! 12 = 56.79, ! 22 = 29.28, and a covariance of 11.98 (Rencher, 1998, p. 106). To perform a post hoc power analysis, choose “F tests”, then “Hotellings T2: One group mean vector” and set the analysis type to “Post hoc”. Enter 2 in the “Response variables” field, and then press the “Determine” button next to effect size label. In the effect size drawer at “Input method: Means and…” choose “variance-covariance matrix” and press “Specify/Edit input values”. Under the “Means” tab insert “1.88” in both input fields; under the “Cov Sigma” tab insert 56.79 and 29.28 in the main diagonal and 11.98 as off-diagonal element in the lower left cell. Pressing the button “Calculate and transfer to main window” initiates the calculation of the effect size (0.380) and transfers it to the main window. For this effect size, # = 0.05, and a total sample size of N = 100 the power amounts to .9282. The procedure in the two-group case is exactly the same, with the following exceptions. First, in the effect size drawer two mean vectors have to be specified. Second, the group sizes may differ. The MANOVA tests in G*Power 3 refer to the multivariate general linear model (O’Brien & Muller, 1993; O’Brien & Shieh, 1999): Y = XB + " , where Y is N ! p of rank p; X is N ! r of rank r ; and the r ! p matrix B contains fixed coefficients. The rows of " are taken to be independent p-variate normal random vectors with mean 0 and p ! p positive-definite covariance ! ! = " 0 , where C is c ! r with full matrix " . The multivariate general linear hypothesis is H0: CBA row rank, and A is p ! a with full column rank (in G*Power 3, " 0 is assumed to be zero). H0 has H0 refer to the matrices ! df1 = a ! c degrees of freedom. All tests of the hypothesis ! ˙ T W˙X ˙ )"1 CT ] "1 (CBU " # ) = NH* H = N(CBU " # 0 )T [C( ˙X 0 ! and ! E = UT "U( N ! rX ) , && is a q ! q “essence model matrix”, W is a q ! q diagonal matrix containing weights where X && T WX && ) (see O’Brien & Shieh, 1999, p.14). Let {! * , K , ! * } be the w j = n j / N , and X T X = N ( X 1 s s = min(a, c) eigenvalues of E !1 H * and {!1 , K , ! s } the s eigenvalues of E !1 H /( N ! rX ) , i.e. "i = "i* N /( N ! rX ) . G*Power 3 (BSC702) Page 16 G*Power 3 offers power analyses for the multivariate model following either the approach outlined in Muller and Peterson (1984; Muller, LaVange, Landesmann-Ramey & Ramey,1992) or alternatively the approach of O’Brien and Shieh (1999; Shieh, 2003). Both approaches approximate the exact distributions of Wilks U (Rao, 1951), the Hotelling-Lawley T1 (Pillai & Samson, 1959), the Hotelling-Lawley T2 (McKeon, 1974), and Pillai’s V (Pillai & Mijares, 1959) by F distributions and are asymptotically equivalent. Table 5 outlines details of both approximations. The type of statistic (U, T1, T2, V) and the approach (Muller-Peterson or O’BrienShieh) can be selected in an Options dialog that can be evoked by clicking the button “Options” at the bottom of the main window. The approach of Muller and Peterson (1984) has found widespread use; for instance, it has been adopted in the SPSS software package. We nevertheless recommend the approach of O’Brien and Shieh (1999) because it has a number of advantages: (a) Unlike the method of Muller and Peterson it provides the exact noncentral F distribution whenever the hypothesis involves at most s = 1 positive eigenvalues, (b) the approximations for s > 1 eigenvalues are almost always more accurate then the Muller and Peterson method (the Muller and Peterson method systematically underestimates power), and (c) it provides a simpler form of the noncentrality parameter, i.e. ( = (* N, where (* is not a function of the total sample size. G*Power 3 provides procedures to calculate the power for global effects in a “one-way MANOVA” and for special effects and interactions in factorial MANOVA designs. These procedures are the direct multivariate analogues of the ANOVA routines described above. Table 5 summarizes information that is needed in addition to the formulae given above to calculate the effect size f from hypothesized values for the mean matrix M (corresponding to matrix B in the model), the covariance matrix *, and the contrast matrix C describing the effect under scrutiny. The effect size drawer can be used to calculate f from known values of the statistic U, T1, T2, or V. Note, however, that the transformation of T2 to f depends on the sample size. Thus this test statistic seems not very well suited for a priori analyses. In line with Bredenkamp and Erdfelder (1985) we recommend V as the multivariate test statistic. Another group of procedures in G*Power 3 supports the multivariate approach to power analyses of repeated measures designs. G*Power provides separate but very similar routines for G*Power 3 (BSC702) Page 17 the analysis of between effects, within effects and interactions in simple A ! B designs, where A is a between-subjects factor and B a within-subjects factor. To illustrate the general procedure we describe in some detail a post hoc analysis of the within-effect for the scenario illustrated in Fig. 4 assuming the variance and correlations structure defined in matrix SR2. We first choose “F tests”, then “MANOVA: Repeated measures, within factors”. In the “Type of power analysis” menu we choose “Post hoc”. We click the “Options” button to open a dialog in which we deselect the “Use mean correlation in effect size calculation” option. We choose the “Pillai V” statistic and the “O’Brien and Shieh” algorithm. Back to the main window we set both “Number of groups” and “Repetitions” to 3, the “Total sample size” to 90, and the “# error probability” to 0.05. To compute the effect size f(V) for the Pillai statistic we open the effect size drawer by clicking on the “Determine” button next to the effect size label. In the effect size drawer select, as procedure, “Effect size from mean and variance-covariance matrix” and, as input method, “SD and correlation matrix”. Clicking on “Specify/Edit matrices” opens another window in which we specify the hypothesized parameters. In the “means” tab we insert our means matrix M, in the “Cov Sigma” tab we choose “SD and Correlation” and insert the values of SR2. Because this matrix is always symmetric, it suffices to specify the lower diagonal values. After closing the dialog and clicking on “Calculate and transfer to main window” we get a value 0.1791 for Pillai’s V and the effect size f(V) = 0.4672. Clicking on “Calculate” shows that the power is 0.980. The analysis of between effects and interaction effects is done in an analogous way. Tests for Proportions The support for tests on proportions has been greatly enhanced in G*Power 3. Table 6 summarizes the tests that are currently implemented. In particular, all tests on proportions considered by Cohen (1988) are now available: (a) the sign test (Cohen, 1988, Chapter 5), (b) the z-test for the difference between two proportions (Cohen, 1988, Chapter 6), and (c) the !2-test for goodness-offit and contingency tables (Cohen, 1988, Chapter 7). The sign test is implemented as a special case (c = 0.5) of the more general binomial test (also available in G*Power 3) that a single proportion has a specified value c. In both procedures G*Power 3 (BSC702) Page 18 Cohen’s effect size g is used and exact power values based on the binomial distribution are calculated. Note, however, that due to the discrete nature of the binomial distribution the nominal value of # usually cannot be realized. Since the tables in Chapter 5 of Cohen’s book use the # value closest to the nominal value, even if it is higher than the nominal value, the tabulated power values are sometimes larger than those calculated by G*Power 3. G*Power 3 always requires the actual # not to be larger than the nominal value. Numerous procedures have been proposed to test the null hypothesis that two independent proportions are identical (D’Agostino, Chase & Belanger, 1988; Cohen, 1988; Suissa & Shuster, 1985; Upton, 1982), and G*Power 3 implements several of them. The simplest procedure is a z test with optional arcsin transformation and optional continuity correction. Besides these two computational options it can also be chosen whether Cohen’s effect size measure h or, alternatively, two proportions are used to specify the alternate hypothesis. Using the options (“Use continuity correction” off, “Use arcsin transform” on) the procedure calculates power values close to those tabulated in Cohen (1988, Chapter 6). The options (“Use continuity correction” off, “Use arcsin transform” off) compute the uncorrected !2-approximation (Fleiss, 1981) whereas (“Use continuity correction” on, “Use arcsin transform” off) computes the corrected !2-approximation (Fleiss, 1981). A second variant is Fisher’s exact conditional test (Haseman, 1978). Normally, G*Power 3 calculates the exact unconditional power. However, despite the highly optimized algorithm used in G*Power 3, long computation times may result for large sample sizes (say N > 1000). Therefore a limiting N can be specified in the Options dialog that determines at which sample size G*Power 3 switches to a large sample approximation. A third variant calculates the exact unconditional power for approximate test statistics T (Table 7 summarizes the supported statistics). The logic underlying this procedure is to enumerate all possible outcomes for the 2 x 2 binomial table, given fixed sample sizes n1, n2 in the two groups. This is done by choosing as success frequency x1 and x2 in each group any combination of the values 0 < x1 " n1 and 0 < x2 " n2. Given the success probabilities +1, +2 in each group, the probability of observing a table X with success frequencies x1, x2 is: #n & #n & P( X | "1 ," 2 ) =% 1 ("1x1 (1) "1 ) n1 )x1 % 2 (" 2x 2 (1) " 2 ) n2 )x 2 $ x1 ' $ x2 ' G*Power 3 (BSC702) Page 19 To calculate power (1-") and the actual type-I error #,, the test statistic T is computed for each table and compared with the critical value T#. If A denotes the set of all tables X rejected by this criterion, i.e. those with T > T#, then the power and the # level are given by: 1" # = & X % A P( X | $1 ,$ 2 ) and " * = % X $ A P( X | # 2 ,# 2 ) , where +2 denotes the success probability in both groups as assumed in the null hypothesis. Please note that the actual # level can be larger that ! the nominal level! The ! preferred input method (proportions, difference, risk ratio, odds ratio; see Table 6) and the test statistic to use (see Table 7) can be changed in the Options dialog. Note that the test statistic actually used to analyze the data should be chosen. For large sample sizes the exact computation may take too much time. Therefore, a limiting N can be specified in the Options dialog that determines at which sample size G*Power switches to large sample approximations. G*Power 3 also provides a group of procedures to test the hypothesis that the difference/risk ratio/odds ratio of a proportion with respect to a specified reference proportion + is different under H1 than a difference/risk ratio/odds ratio to the same reference proportion assumed in H0. These procedures are available in the “exact” test family as “Proportions: Inequality (offset), two independent groups (unconditional)”. The enumeration procedure described above for the tests on differences between proportions without offset is also used in this case. In the tests without offset, the different input parameters (differences, risk ratio, etc.) are equivalent ways of specifying two proportions. The specific choice has no influence on the results. In the case of tests with offset, however, each input method has a different set of available test statistics. The preferred input method (proportions, difference, risk ratio, odds ratio; see Table 6) and the test statistic to use (see Table 8) can be changed in the Options dialog. As in the other exact procedures, the computation may be time consuming and a limiting N can be specified in the Options dialog that determines at which sample size G*Power switches to large sample approximations. Also new in G*Power 3 is an exact procedure to calculate the power for the McNemar test. The null hypothesis of this test states that the proportions of successes are identical in two dependent samples. Figure 5 shows the structure of the underlying design: A binary response is sampled from the same subject or a matched pair in a standard and in a treatment condition. The G*Power 3 (BSC702) Page 20 null hypothesis +s = +t is formally equivalent to the hypothesis for the odd ratio: OR =! 12 / ! 21 = 1 . To fully specify H1 we not only need to specify the odds ratio, but also the proportion +D of discordant pairs, that is, the expected proportion of responses that differ in the standard and the treatment condition. The exact procedure used in G*Power 3 calculates the unconditional power for the exact conditional test, which calculates the power conditional on the number nD of discordant pairs. Let p(nD = i) be the probability that the number of discordant pairs is i. Then the unconditional power is the sum over all i - {0, …. N} of the conditional power for nD = i weighted with p(nD = i). This procedure is very efficient, but for very large sample sizes the exact computation nevertheless may take too much time. Again, a limiting N can be specified in the Options dialog that determines at which sample size G*Power switches to a large sample approximation. The large sample approximation calculates the power based on an ordinary one sample binomial test with Bin(N+D, 0.5) as the distribution under H0 and Bin(N+D, OR/(1+OR)) as the H1 distribution. Tests for Variances Table 9 summarizes important properties of the two procedures for testing hypotheses on variances that are currently supported by G*Power 3. In the one-group case the null hypothesis is tested that the population variance %2 has a specified value c. The variance ratio %2/c is used as the effect size. The “central and noncentral” distributions, corresponding to H0 and H1, respectively, are central !2 distributions with N–1 degrees of freedom (because H0 and H1 are based on the same mean). To compare the variance distributions under both hypotheses, the H1 distribution is scaled with the value r postulated for the ratio %2/c in the alternate hypothesis, i.e. the noncentral distribution is r!2N-1 (Ostle & Malone, 1988). In the two-groups case H0 states that the variances in two populations are identical (%2/%1 = 1). Analogous to the one sample case, two central F distributions are compared, the H1 distribution being scaled by the value of the variance ratio %2/%1 postulated in H1. Generic tests G*Power 3 (BSC702) Page 21 Besides the specific routines described in Tables 2 to 9 that cover a considerable part of the tests commonly used, G*Power 3 provides “generic” power analysis routines that may be used for any test based on the t, F, !2, z, and binomial distribution. In generic routines the parameters of the central and noncentral distributions are specified directly. To demonstrate the uses and limitations of these generic routines we will show how to do a two-tailed power analysis for the one-sample t test by using the generic routine. You may compare the results with those of the specific routine available in G*Power for that test. First, we select the “t tests” family and then “Generic t test” (the generic test option is always located at the end of the list of tests). Next, we select “Post hoc” as the type of power analysis. We choose a two-tailed test and 0.05 as “# error probability”. We now need to specify the noncentrality parameter $ and the degrees of freedom for our test. We look up the definitions for the one-sample test in Table 3 and find " = d N and df = N ! 1 . Assuming a “medium” effect of d = 0.5 and N = 25 we arrive at $ = 0.5·5 = 2.5, and df = 24. Inserting these values and clicking “Calculate” we obtain a power of (1-") ! = 0.6697. The critical value t = 2.0639 corresponds to the specified #. In this post hoc power analysis, the generic routine is almost as simple as the specific routine. The main disadvantage of the generic routines is, however, that the dependence of the noncentrality parameter on the sample size is implicit. As a consequence, we cannot perform a priori analyses automatically. Rather, we need to iterate N by hand until we find an appropriate power value. 5) Statistical Methods and Numerical Algorithms The subroutines used to compute the distribution functions (and the inverse) of the noncentral t, F, !2, the z and the binomial distribution are based on the C-version of the DCDFLIB (available from http://www.netlib.org/random/) which was slightly modified for our purposes. G*Power 3 no longer provides approximate power analyses that were available in the speed mode of G*Power 2. Two arguments guided us in supporting exact power calculations only. First, four-digit precision of power calculations may be mandatory in many applications. For example, both compromise power analyses for very large samples and error probability adjustments in case of multiple tests of significance may result in very small values of # or " (Westermann & Hager, 1986). Second, as a G*Power 3 (BSC702) Page 22 consequence of improved computer technology, exact calculations have become so fast that the speed gain associated with approximate power calculations is not even noticeable. Thus, from a computational standpoint, there is little advantage to using approximate rather than exact methods (cf. Bradley, Russel, & Reeve, 1998). 6) Program Availability and Internet Support To summarize, G*Power 3 is a major extension of, and improvement over, G*Power 2 in that it offers easy-to-apply power analyses for a much larger variety of common statistical tests. Program handling is more flexible, easier to understand, and more intuitive compared to G*Power 2, reducing the risk of erroneous applications. The added graphical features should be useful for both research and teaching purposes. Thus, G*Power 3 is likely to become a useful tool for empirical researchers and students of applied statistics. Like its predecessor, G*Power 3 is a noncommercial program that can be downloaded free of charge. Copies of the Mac and the Windows versions are available at http://www.psycho.uniduesseldorf.de/abteilungen/aap/gpower3/ only. Users interested in distributing the program in another way must ask for permission from the authors. Commercial distribution is strictly forbidden. The G*Power 3 webpage will offer an expanding web-based tutorial describing how to use the program, along with examples. Users who let us know their e-mail addresses will be informed of updates. Although considerable effort has been put into program development and evaluation, there is no warranty whatsoever. Users are kindly asked to report possible bugs and difficulties in program handling to [email protected]. G*Power 3 (BSC702) Page 23 References Akkad, D.A., Jagiello, P., Szyld, P., Goedde, R., Wieczorek, S., Gross, W.L., & Epplen, J.T. (2006). Promoter polymorphism rs3087456 in the MHC class II transactivator gene is not associated with susceptibility for selected autoimmune diseases in German patient groups. International Journal of Immunogenetics, 33, 59-61. Back, M.D., Schmukle, S.C., & Egloff, B. (2005). Measuring Task-Switching ability in the Implicit Association Test. Experimental Psychology, 52, 167-179. Baeza, J. A. & Stotz, W. (2003). Host-use and selection of differently colored sea anemones by the symbiotic crab Allopetrolisthes spinifrons. Journal of Experimental Marine Biology and Ecology, 284, 25-39. Barabesi, L. & Greco, L (2002). A note on the exact computation of the Student t Snedecor F and sample correlation coefficient distribution functions. Journal of the Royal Statistical Society: Series D (The Statistician), 51, 105-110. Berti, S., Münzer, S., Schröger, E., & Pechmann, T. (2006). Different interference effects in musicians and a control group. Experimental Psychology, 53, 111-116 Bradley, D. R., Russell, R. L., & Reeve, C.P. (1998). The accuracy of four approximations to noncentral F. Behavior Research Methods, Instruments, & Computers, 30, 478-500. Bredenkamp, J. (1969). Über die Anwendung von Signifikanztests bei theorie-testenden Experimenten [The application of significance tests in theory-testing experiments]. Psychologische Beiträge, 11, 275-285. Bredenkamp, J., & Erdfelder, E. (1985). Multivariate Varianzanalyse nach dem VKriterium [Multivariate analysis of variance based on the V-criterion]. Psychologische Beiträge, 27, 127-154. Buchner, A., Erdfelder, E., & Faul, F. (1996). Teststärkeanalysen. In E. Erdfelder, R. Mausfeld, T. Meiser & G. Rudinger (Eds.), Handbuch Quantitative Methoden [Handbook quantitative methods, pp. 123-136}. Weinheim, Germany: Psychologie Verlags Union. Buchner, A., Erdfelder, E., & Faul, F. (1997). How to Use G*Power [WWW document]. URL http://www.psycho.uni-duesseldorf.de/aap/projects/gpower/how_to_use_gpower.html G*Power 3 (BSC702) Page 24 Busbey, A. B. I. (1999). Macintosh shareware/freeware earthscience software. Computers & Geosciences, 25, 335-340. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ, Lawrence Erlbaum Associates. D’Agostino, R.B, Chase, W. & Belanger, A. (1988). The appropriateness of some common procedures for testing the equality of two independent binomial populations. The American Statistican, 42,198-202. Erdfelder, E. (1984). Zur Bedeutung und Kontrolle des beta-Fehlers bei der inferenzstatistischen Prüfung log-linearer Modelle [Significance and control of the beta error in statistical tests of log-linear models]. Zeitschrift für Sozialpsychologie, 15, 18-32. Erdfelder, E., Buchner, A., Faul, F., & Brandt, M. (2004). GPOWER: Teststärkeanalysen leicht gemacht. In E. Erdfelder & J. Funke (Eds.), Allgemeine Psychologie und deduktivistische Methodologie [Experimental psychology and deductive methodology, pp. 148-166]. Göttingen, Germany: Vandenhoeck & Ruprecht. Erdfelder, E., Faul, F., & Buchner, A. (1996). GPOWER: A general power analysis program. Behavior Research Methods, Instruments, & Computers, 28, 1-11. Erdfelder, E., Faul, F., & Buchner, A. (2005). Power analysis for categorical methods. In B. S. Everitt & D. C. Howell (Eds.), Encyclopedia of statistics in behavioral science (pp. 1565 – 1570). Chichester, GB: Wiley. Farrington & Manning (1990). Test statistics and sample size formulae for comparative binomial trials with null hypothesis of non-zero risk difference or non-unity relative risk. Statistics in Medicine, 9, 1447-1454. Field, A.P. (2005). Discovering statistics with SPSS (2nd ed.). London: Sage. Fleiss, J.L. (1981). Statistical methods for rates and proportions, 2nd ed., New York: Wiley. Frings, C. & Wentura, D. (2005). Negative priming with masked distractor-only prime trials: Awareness moderates negative priming. Experimental Psychology, 52, 131-139. Gerard, P.D., Smith, D.R., Weerakkody, G. (1998). Limits of retrospective power analysis. Journal of Wildlife Management, 62, 801-807. G*Power 3 (BSC702) Page 25 Gart, J.J. & Nam, J. (1988). Approximate interval estimation of the ratio in binomial parameters: A review and correction for skewness. Biometrics, 44, 323-338. Gart, J.J. & Nam, J. (1990). Approximate interval estimation of the difference in binomial parameters: Correction for skewness and extension to multiple tables. Biometrics, 46, 637-643. Geisser, S. & Greenhouse, S.W. (1958). An extension of Box's results on the use of the F distribution in multivariate analysis. The Annals of Mathematical Statistics, 29, 885-891. Gigerenzer, G., Krauss, S., & Vitouch, O. (2004). The null ritual: What you always wanted to know about significance testing but were afraid to ask. In D. Kaplan (Ed.), The SAGE handbook of quantitative methodology for the social sciences (pp. 391-408). Thousand Oaks, CA: Sage. Gleissner, U., Clusmann, H., Sassen, R., Elger, C.E., & Helmstaedter, C. (2006). Postsurgical outcome in pediatric patients with epilepsy: A comparison of patients with intellectual disabilities, subaverage intelligence, and average-range intelligence. Epilepsia, 47, 406-414. Goldstein, R. (1989). Power and sample size via MS/PC-DOS computers. The American Statistician, 43, 253-26. Hager, W. (2006). Die Fallibität empirischer Daten und die Notwendigkeit der Kontrolle von falschen Entscheidungen [The fallibility of empirical data and the need for controlling for false decisions]. Zeitschrift für Psychologie, 214, 10-23. Haseman, J.K. (1978). Exact sample sizes for use with the Fisher-Irwin test for 2 ! 2 tables. Biometrics, 34, 106-109. Hoenig, J.N. & Heisey, D.M. (2001). The abuse of power: The pervasive fallacy of power calculations for data analysis. The American Statistician, 55, 19-24. Hoffmann, J. & Sebald, A. (2005). Local contextual cuing in visual search. Experimental Psychology, 52, 31-38. Huynh, H. & Feldt, L.S. (1970). Conditions under which mean square ratios in repeated measurements designs have exact F-distribution. Journal of the American Statistical Association, 65, 1582-1589. Keppel, G. & Wickens, T.D. (2004). Design and analysis. A researcher’s handbook (4th ed.). Upper Saddle River, NJ, Pearson Education International. G*Power 3 (BSC702) Page 26 Kornbrot, D. E. (1997). Review of statistical shareware G*Power. British Journal of Mathematical and Statistical Psychology, 50, 369-370. Kromrey, J. & Hogarty, K.Y. (2000). Problems with probabilistic hindsight: A comparison of methods for retrospective statistical power analysis. Multiple Linear Regression Viewpoints, 26, 7-14. Lenth, R.V. (2001). Some practical guidelines for effective sample size determination. The American Statistician, 55, 187-193. Levin, J. R. (1997). Overcoming feelings of powerlessness in "aging" researches: A primer on statistical power in analysis of variance designs. Psychology and Aging, 12, 84-106. McKeon, J.J. (1974). F approximations to the distribution of Hotelling’s T02 . Biometrika, 61, 381-383. Mellina, E., Hinch, S.G., Donaldson, E.M., & Pearson, G. (2005). ! Stream habitat and rainbow trout (Oncorhynchus mykiss) physiological stress responses to streamside clear-cut logging in British Columbia. Canadian Journal of Forest Research, 35, 541-556. Miettinen, O. & Nurminen, M. (1985). Comparative analysis of two rates. Statistics in Medicine, 4, 213-226. Müller, J., Manz, R., & Hoyer, J. (2002). Was tun, wenn die Teststärke zu gering ist? Eine praktikable Strategie für Prä-Post-Designs. [What to do if the power is low? A useful strategy for pre-post designs]. Psychotherapie, Psychosomatik, Medizinische Psychologie, 52, 408-416. Muller, K.E. & Barton, C.N. (1989). Approximate power for repeated-measures ANOVA lacking sphericity. Journal of the American Statistical Association, 84, 549-555. Muller, K.E., LaVange, L.M., Landesman-Ramey, S. & Ramey, C.T. (1992). Power calculations for general linear multivariate models including repeated measures applications. Journal of the American Statistical Association, 87,1209-1226. Muller, K.E. & Peterson, B.L. (1984). Practical methods for computing power in testing the multivariate general linear hypothesis. Computational Statistics and Data Analysis, 2, 143-158. Myers, J. L. & Well, A.D. (2003). Research design and statistical analysis (2nd ed.). Mahwah, NJ, Lawrence Erlbaum Associates. G*Power 3 (BSC702) Page 27 O’Brien, R.G. & Kaiser, M.K. (1985). MANOVA method for analyzing repeated measures designs: An extensive primer. Psychological Bulletin, 97, 316-333. O'Brien, R.G., & Muller, K.E. (1993). Unified power analysis for t-tests through multivariate hypotheses. In Edwards, L.K. (Ed.). Applied analysis of variance in behavioral science. (pp. 297-344). New York, NY, US: Marcel Dekker. O’Brien, R.G. & Shieh, G. (1999). Pragmatic, unifying algorithm gives power probabilities for common F tests of the multivariate general linear hypothesis. UnifyPow website: www.bio.ri.ccf.org/UnifyPow. Ortseifen, C., Bruckner, T., Burke, M. & Kieser, M. (1997). An overview of software tools for sample size determination. Informatik, Biometrie und Epidemiologie in Medizin und Biologie, 28, 91-118. Ostle, B., & Malone, L.C. (1988). Statistics in research: Basic Concepts and Techniques for Research Workers. Fourth Edition. Ames, Iowa: Iowa State Press. Pillai, K.C.S. & Mijares, T.A. (1959). On the moments of the trace of a matrix and approximations to its distribution. Annals of Mathematical Statistics, 30, 1135-1140. Pillai, K.C.S. & Samson, P, Jr. (1959). On Hotelling’s generalization of T2. Biometrika, 46, 160-168. Quednow, B.B., Kuhn, K.U., Stelzenmueller, R., Hoenig, K., Maier, W., & Wagner, M. (2004). Effects of serotonergic and noradrenergic antidepressants on auditory startle response in patients with major depression. Psychopharmacology, 175, 399-406. Rao, C.R. (1951). An asymptotic expansion of the distribution of Wilk’s criterion. Bulletin of the International Statistical Institute, 33, 177-180. Rasch, B., Friese, M., Hofmann, W.J., Naumann, E. (2006a). Quantitative Methoden 1. Einführung in die Statistik (2. Auflage) [Quantitative methods 1. Introduction to statistics]. Heidelberg: Springer Rasch, B., Friese, M., Hofmann, W.J., Naumann, E. (2006b). Quantitative Methoden 2. Einführung in die Statistik (2. Auflage) [Quantitative methods 1. Introduction to statistics]. Heidelberg: Springer Rencher, A.C. (1998). Multivariate statistical inference and applications. New York: John G*Power 3 (BSC702) Page 28 Wiley & Sons. Richardson, J. T. E. (1996). Measures of effect size. Behavior Research Methods, Instruments, & Computers, 28, 12-22. Scheffé, H. (1959). The analysis of variance. New York: John Wiley & Sons. Schwarz, W. & Müller, D. (2006). Spatial associations in number-related tasks. A comparison of manual and pedal responses. Experimental Psychology, 53, 4-15. Sheppard, C. (1999). How large should my sample be? Some quick guides to sample size and the power of tests. Marine Pollution Bulletin, 38, 439-447. Shieh, G. (2003). A comparative study of power and sample size calculations for multivariate general linear models. Multivariate Behavioral Research, 38, 285-307. Smith, R.E. & Bayen, U.J. (2005). The effects of working memory resource availability on prospective memory: A formal modeling approach. Experimental Psychology, 52, 243-256. Steidl, R.J., Hayes, J.P., & Schauber, E. (1997). Statistical power analysis in wildlife research. Journal of Wildlife Management, 61, 270-279. Suissa, S. & Shuster, J.J. (1985). Exact unconditional sample sizes for 2 ! 2 binomial trial. Journal of the Royal Statistical Society, A, 148, 317-327. Thomas, L. & Krebs, C.J. (1997). A review of statistical power analysis software. Bulletin of the Ecological Society of America, 78, 126-139. Upton, G.J.G. (1982). A comparison of alternative tests for the 2 ! 2 comparative trial. Journal of the Royal Statistical Society, A, 145, 86-105. Westermann, R. & Hager, W. (1986). Error probabilities in educational and psychological research. Journal of Educational Statistics,11, 117-146. Zumbo, B.D. & Hubley, A.M. (1998). A note on misconceptions concerning prospective and retrospective power. The Statistician, 47, 385-388. G*Power 3 (BSC702) Page 29 Footnotes 1 The “observed power” is reported in many frequently used computer programs (e.g., the MANOVA procedure of SPSS). 2 We recommend to check the df’s reported by G*Power, for example, by comparing them to the df’s reported by the program used to analyze the sample data. If the df’s do not match, the input provided to G*Power is incorrect and the power calculations do not apply. 3 Plots of the central and non-central distribution are only shown for tests based on the t-, F-, z-, !2, or binomial distribution. No plots are shown for tests that involve an enumeration procedure (e.g. the McNemar test). 4 We would like to thank Dave Kenny for making us aware of the fact the t test (correlation) power analyses of G*Power 2 are correct only in the point-biserial case (i.e., correlations between a binary variable and continuous variable, the latter being normally distributed for each value of the binary variable). For correlations between two continuous variables following a bivariate normal distribution, the t test (correlation) procedure of G*Power 2 overestimates power. For this reason, G*Power 3 offers separate power analyses for point-biserial correlations (in the t family of distributions) and correlations between two normally distributed variables (in the exact distribution family). However, power values will usually differ only slightly between procedures. To illustrate, assume we are interested in the power of a two-tailed test of H0: & = .00 for continuously distributed measures derived from two Implicit Association Tests (IATs) differing in content. Assume further that, due to method-specific variance in both versions of the IAT, the true Pearson correlation is actually & = .30 (effect size). Given # = .05 and N = 57 (see Back, Schmukle, & Egloff, 2005, Study 3, p. 173), an exact post-hoc power analysis for “Correlations: Differences from constant (one sample case)” reveals the correct power value of (1-") = .63. Choosing the G*Power 3 (BSC702) Page 30 incorrect “Correlation: point biserial model” procedure from the t test family would result in (1-") = .65. G*Power 3 (BSC702) Page 31 Authors Note Franz Faul, Institut für Psychologie, Christian-Albrechts-Universität, Kiel, Germany; Edgar Erdfelder, Lehrstuhl für Psychologie III, Universität Mannheim, Mannheim, Germany; AlbertGeorg Lang and Axel Buchner, Institut für Experimentelle Psychologie, Heinrich-HeineUniversität, Düsseldorf, Germany. Manuscript preparation has been supported by grants from the Deutsche Forschungsgemeinschaft (SFB 504, Project A12) and the state of Baden-Württemberg, Germany (Landesforschungsprogramm “Evidenzbasierte Stressprävention”). Correspondence concerning this article should be addressed to Franz Faul, Institut für Psychologie, Christian-Albrechts-Universität, Olshausenstr. 40, D-24098 Kiel, Germany (email: [email protected]), or Edgar Erdfelder, Lehrstuhl für Psychologie III, Universität Mannheim, Schloss, D-68131 Mannheim, Germany (email: [email protected]). G*Power 3 (BSC702) 32 Page Figure Captions Figure 1: The distribution-based approach of test specification in G*Power 3.0 Figure 2: The design-based approach of test specification in G*Power 3.0 and the effect size drawer Figure 3: The Power Plot window of G*Power 3.0 Figure 4: Sample 3 x 3 Repeated Measures designs: Three groups are repeatedly measured at three different times. The shaded portion of the table is the postulated matrix M of population means µij. The last column of the table contains the sample size in each group. The symmetric matrices SRi specify two different covariance structures between measurements taken at different times: The main diagonal contains the standard deviations of the measurements at each time, the off-diagonal elements contain the correlations between pairs of measurements taken at different times. Figure 5: Matched binary response design (McNemar test). G*Power 3 (BSC702) Page 33 Figure 1 G*Power 3 (BSC702) Page 34 Figure 2 G*Power 3 (BSC702) Page 35 Figure 3 G*Power 3 (BSC702) Page 36 Figure 4 Group 1 Group 2 Group 3 µ• j Time 1 Time 2 Time 3 µ i• ni 10 10 10 10 15 12 12 13 20 15 12 15.667 15 12.333 11.333 30 30 30 µ•• = 12.889 & 10 0.3 0.1# $ ! SR 2 = $ 0.3 9 0.3 ! $ 0.1 0.3 8 ! % " & 9 0.3 0.3 # $ ! SR1 = $ 0.3 9 0.3 ! $ 0.3 0.3 9 ! % " G*Power 3 (BSC702) Page 37 Figure 5 Treatment Yes No ! ! ! Standard Yes No "11 "12 " 21 " 22 "s 1" # s ! ! ! ! ! Proportion of discordant pairs: " D = "12 + " 21 "t 1" # t 1 Hypothesis: ! s = ! t or equivalently "12 = " 21 ! ! Tabl% 1( )ymb,ls and th%ir m%anings as us%d in th% tabl%s in this articl%7 )ymb,ls 8%aning ! ; : !i 9 <,pulati,n m%an :in gr,up i9 ! ; : !i 9 ! V%ct,r ,f p,pulati,n m%ans :in gr,up i9 !x" y <,pulati,n m%an ,f th% diff%r%nc% N T,tal sampl% siz% ni )ampl% siz% in gr,up i # )tandard d%viati,n in th% p,pulati,n #! )tandard d%viati,n ,f th% %ff%ct # x" y )tandard d%viati,n ,f th% diff%r%nc% $ N,nc%ntrality param%t%r ,f th% n,nc%ntral F and !2 distributi,n % N,nc%ntrality param%t%r ,f th% n,nc%ntral t distributi,n df D%gr%%s ,f fr%%d,m df1 ; df 2 Num%rat,r E d%n,minat,r d%gr%%s ,f fr%%d,m & ; : &i 9 <,pulati,n c,rr%lati,n :in gr,up i9 RY2' A ; RY2' A; B )quar%d multipl% c,rr%lati,n c,%ffici%nts; c,rr%sp,nding t, th% pr,p,rti,n ,f Y varianc% that can b% acc,unt%d f,r by multipl% r%gr%ssi,n ,n th% s%t ,f pr%dict,r variabl%s A and A ( B ; r%sp%ctiv%ly ! <,pulati,n varianc%-c,varianc% matrix M 8atrix ,f r%gr%ssi,n param%t%rs :p,pulati,n m%ans9 C I,ntrast matrix :c,ntrasts b%tw%%n r,ws ,f M 9 A I,ntrast matrix :c,ntrasts b%tw%%n c,lumns ,f M 9 ) ; :) i 9 <r,p,rti,n :pr,bability9 ,f succ%ss :in gr,up i9 ! Tabl% 2( T%sts f,r c,rr%lati,n and r%gr%ssi,n I,rr%lati,n and R%gr%ssi,n T%st T%st family Null Hyp,th%sis Eff%ct )iz% Diff%r%nc% fr,m z%r,( <,int bis%rial m,d%l t t%sts & *O & Diff%r%nc% fr,m c,nstant :bivariat% n,rmal9 %xact & *c & Pn%quality ,f tw, c,rr%lati,n c,%ffici%nts z t%sts &1 * & 2 q * z1 " z 2 Nth%r <aram%t%rs N,nc%ntrality <aram%t%r % * &2 ' N 1" & 2 df * N " 2 I,nstant c,rr%lati,n c zi * 8ultipl% R%gr%ssi,n( d%viati,n ,f R2 fr,m z%r, F t%sts 8ultipl% R%gr%ssi,n( incr%as% ,f R2 F t%sts RY2'A * O RY2' A; B * RY2' A 1 1 + &i ln 2 1 " &i f2 * f 2 * m1 * q s s* n1 + n 2 " R :n1 " Q9:n2 " Q9 RY2' A 1 " RY2' A Numb%r ,f pr%dict,rs p :SA9 $ * f 2N RY2' A; B " RY2' A T,tal numb%r ,f pr%dict,rs p :SA U SB9 $ * f 2N 1 " RY2' A; B Numb%r ,f t%st%d pr%dict,rs q :SB9 df1 * p df 2 * N " p " 1 df1 * q df 2 * N " p " 1 Tabl% Q( T%sts f,r univariat% m%ans 8%ans :univariat%9 T%st T%st Family Null Hyp,th%sis Diff%r%nc% fr,m c,nstant :,n% sampl% cas%9 t t%sts ! *c Pn%quality ,f tw, d%p%nd%nt m%ans :match%d pairs9 t t%sts ! x" y * O Eff%ct )iz% d* Nth%r <aram%t%rs ! "c # dz * N,nc%ntrality <aram%t%r and D%gr%%s ,f Fr%%d,m % *d N df * N " 1 X ! x" y X # x "7 y % * dz N ; df * N " 1 # x" y * # x2 + # y2 " 2 &# x# y Pn%quality ,f tw, ind%p%nd%nt m%ans t t%sts !1 * ! 2 ANNVA; fix%d %ff%cts; ,n%way( Pn%quality ,f multipl% m%ans F t%sts !i " ! * O; i * 1;"; k !1 " ! 2 # f * #! ; # n1n2 n1 + n2 df * N " 2 % *d k # !2 * ANNVA; fix%d %ff%cts; multifact,r d%signs and plann%d c,mparis,ns F t%sts ANNVA( r%p%at%d m%asur%m%nts; b%tw%%n %ff%cts F t%sts ANNVA( r%p%at%d m%asur%m%nts within %ff%cts d* !i " ! * O; i * 1;"; k f * , n :! " ! 9 j Numb%r ,f gr,ups k $ * f 2N T,tal numb%r ,f c%lls in th% d%sign k $ * f 2N 2 i i *1 #! # N df1 * q df 2 * : N " k 9 D%gr%%s ,f fr%%d,m ,f th% t%st%d %ff%ct q7 !i " ! * O; i * 1;"; k F t%sts !i " ! * O; i * 1;"; m f * f * #! # #! # Y%v%ls ,f th% b%tw%%n fact,r k Y%v%ls ,f th% r%p%at%d m%asur% fact,r m <,pulati,n c,rr%lati,n am,ng r%p%at%d m%asur%m%nts & ANNVA( r%p%at%d m%asur%m%nts b%tw%%n-within int%racti,ns df1 * k " 1 df 2 * N " k F t%sts !ij " !i " # ! j + ! * O; i * 1;"; k j * 1;"; m f * #! # F,r within and within-b%tw%%n int%racti,ns( N,nsph%ricity c,rr%cti,n ! 1 . - .1 m "1 $ * f 2uNu* m 1 + :m " 19 & df1 * :k " 19 df 2 * : N " k 9 $ * f 2uN m 1" & df1 * :m " 19- u* df 2 * : N " k 9:m " 19- $ * f 2uNm 1" & df1 * :k " 19:m " 19- u* df 2 * : N " k 9:m " 19- Tabl% Z( T%sts f,r multivariat% m%ans 8%ans :multivariat%9 T%st H,t%lling T2 diff%r%nc% fr,m c,nstant m%an v%ct,r H,t%lling T2 diff%r%nc% b%tw%%n tw, m%an v%ct,rs 8ANNVA gl,bal %ff%cts T%st Family F t%sts F t%sts Null Hyp,th%sis ! ! ! *c ! ! !1 * ! 2 ! ! / * v T ! "1v ! ! v * ! "c Nth%r <aram%t%rs Numb%r ,f r%sp,ns% variabl%s k ! ! / * v T ! "1v ! ! v * !1 " ! 2 Numb%r ,f r%sp,ns% variabl%s k N,nc%ntrality <aram%t%r and D%gr%%s ,f Fr%%d,m $ * /2 N df1 * k df 2 * N " k $ * /2 n1n2 n2 + n2 df1 * k df 2 * N " k " 1 F t%sts CM * 0 8%ans matrix M 8ANNVA sp%cial %ff%cts Eff%ct )iz% F t%sts I,ntrast matrix C Eff%ct siz% f mult d%p%nds ,n th% t%st statistics( " [ilks U " H,t%lling-Yawl%y T1 " H,t%lling-Yawl%y T2 " <illai V and alg,rithms( 8ANNVA( r%p%at%d m%asur%m%nts; b%tw%%n %ff%cts F t%sts 8ANNVA( r%p%at%d m%asur%m%nts within %ff%cts F t%sts 8ANNVA( r%p%at%d m%asur%m%nts b%tw%%n-within int%racti,ns F t%sts CMA * 0 8%ans matrix M B%tw%%n c,ntrast matrix C [ithin c,ntrast matrix A " 8ull%r and <%t%rs,n :1^_Z9 " N`Bri%n and )hi%h :1^^^9 Numb%r ,f gr,ups g Numb%r ,f r%sp,ns% variabl%s k Numb%r ,f gr,ups g Numb%r ,f pr%dict,rs p Numb%r ,f r%sp,ns% variabl%s k Y%v%ls ,f th% b%tw%%n fact,r k Y%v%ls ,f th% r%p%at%d m%asur% fact,r m N,nc%ntrality param%t%r and d%gr%%s ,f fr%%d,m d%p%nd ,n th% t%st statistic and alg,rithm us%d :s%% %ff%ct siz% c,lumn and Tabl% R97 Tabl% a( Appr,ximating univariat% statistics f,r multivariat% hyp,th%s%s Appr,ximating univariat% statistics )tatistic F,rmula [ilks U 8< U* Num%rat,r D%gr%% ,f Fr%%d,m df2 s 0 :1 + 1 9 "1 k k *1 df 2 * g : N " g1 9 " g 2 c g1 * r + :a " c + 19 E 2 g 2 * :ca " 29 E 2 [ilks U N) <illai V 8< U* s 0 :1 + 1 9 b "1 k k *1 V* s 01 k E:1 + 1k 9 1 5 2 g * 4 :ca9 2 " Z 2 c2 + a2 " a 3 Eff%ct )iz% and N,nc%ntrality <aram%t%r f :U 9 2 * 1 " U 1E g U 1E g $ * f :U 9 2 df 2 ca . Q ca 6 Z df 2 * s : N " r " a + s 9 f :U 9 2 * 1 " U 1E g U 1E g $ * Ng f :U 9 2 f :V 9 2 * k *1 V :s " V 9 $ * f :V 9 2 df 2 <illai V N) V* s 01 b k f :V 9 2 * E:1 + 1kb 9 k *1 V :s " V 9 $ * Nsf :V 9 2 H,t%llingYawl%y T1 8< T* H,t%llingYawl%y T1 N) T* H,t%llingYawl%y T2 8< T* H,t%llingYawl%y T2 N) T* df 2 * s : N " r " a " 19 + 2 s 01 k $ * f :T 9 2 df 2 k *1 s 01 f :T 9 2 * T E s b k $ * Nsf :T 9 2 k *1 s 0 1k k *1 s 01kb k *1 f :T 9 2 * T E s df 2 * Z + :ca + 29 g g* : N " r 92 " : N " r 9 g Z + gQ : N " r 9 g 2 " g1 g1 * c + 2a + a 2 " 1 g2 * c + a + 1 g Q * a:a + Q9 g Z * 2a + Q h * :df 2 " 29 E: N " r " a " 19 f :T 9 2 * T E h $ * f :T 9 2 df 2 f :T 9 2 * T E h $ * Nhf :T 9 2 Tabl% R( T%sts f,r pr,p,rti,ns <r,p,rti,ns T%st T%st Family Hyp,th%sis I,nting%ncy tabl%s d e,,dn%ss ,f Fit !2 t%sts ) 1i * ) Oi i * 1;"; k , k i *1 Eff%ct )iz% k , w* Nth%r <aram%t%rs N,nc%ntrality <aram%t%r $ * w2 N :) 1i " ) Oi 9 2 i *1 ) Oi ) Oi * 1 Diff%r%nc% fr,m c,nstant :,n% sampl% cas%9 %xact ) *c g *) "c I,nstant pr,p,rti,n c Pn%quality ,f tw, d%p%nd%nt pr,p,rti,ns :8cN%mar9 %xact ) 12 E) 21 * 1 Ndds rati, * ) 12 E ) 21 <r,p,rti,n ,f disc,rdant pairs * ) 12 + ) 21 )ign t%st %xact ) * 1E 2 g * ) " 1E 2 Pn%quality ,f tw, ind%p%nd%nt pr,p,rti,ns z t%sts )1 * ) 2 :A9 Alt%rnat% pr,p7 ) 2 ; :A9 Null pr,p7 ) 1 :B9 h * 11 " 12 1i * 2 arcsin ) i Pn%quality ,f tw, ind%p%nd%nt pr,p,rti,ns :Fish%r`s %xact t%st9 %xact )1 * ) 2 alt%rnat% pr,p7 ) 1 Null pr,p7 ) 2 Pn%quality ,f tw, ind%p%nd%nt pr,p,rti,ns :unc,nditi,nal9 %xact )1 * ) 2 :A9 Alt%rnat% pr,p7( ) 1 :B9 Diff%r%nc%( ) 2 " ) 1 :I9 Risk rati,( ) 2 E ) 1 :D9 Ndds rati,( ) 1 E:1 " ) 1 9 ) 2 E:1 " ) 2 9 Null pr,p7 ) 2 Pn%quality with ,ffs%t ,f tw, ind%p%nd%nt pr,p,rti,ns :unc,nditi,nal9 %xact )1 * ) 2 + c :A9 Alt%rnat% pr,p( ) 1X H 1 :A9 <r,p7( ) 1X H :B9 Diff%r%nc%( ) 2 " ) 1X H :B9 Diff%r%nc%( ) 2 " ) 1X H :I9 Risk rati,( ) 2 E ) 1X H :D9 Ndds rati,( ) 1X H E:1 " ) 1X H 9 1 ) 2 E:1 " ) 2 9 1 1 1 O :I9 Risk rati,( ) 2 E ) 1X H :D9 Ndds rati,( ) 1X H E:1 " ) 1X H 9 O ) 2 E:1 " ) 2 9 Null pr,p7 ) 2 O O O Tabl% f( T%st statistics us%d in t%sts ,f th% diff%r%nc% b%tw%%n tw, ind%p%nd%nt pr,p,rti,ns7 Th% z-t%sts in th% tabl% ar% m,r% c,mm,nly kn,wn as 72 t%sts :th% %quival%nt z t%st is us%d t, pr,vid% tw,-sid%d t%sts97 Nr Nam% 1 z-t%st p,,l%d varianc% 2 z-t%st p,,l%d varianc% with c,ntinuity c,rr%cti,n Q z-t%st unp,,l%d varianc% Z z-t%st unp,,l%d varianc% with c,ntinuity c,rr%cti,n a 8ant%l-Ha%nsz%l t%st )tatistic z* z* z* z* z* =1 1: n )g + n )g )g1 " )g 2 c #g * )g :1 " )g 9;; + 88 c )g * 1 1 2 2 #g n1 + n2 < n1 n2 9 k=1 1 : + 8 2 < n1 n2 89 5" 1 l,w%r tail c #g s%% :19c k * 4 g # 3+ 1 upp%r tail )g1 " )g 2 + ;; )g1 " )g 2 )g1 :1 " )g1 9 )g 2 :1 " )g 2 9 + c #g * #g n1 n2 k=1 1 : + 8 2 < n1 n2 89 5" 1 l,w%r tail c #g s%% :Q9c k * 4 g # 3+ 1 upp%r tail )g1 " )g 2 + ;; x1 " E : x1 9 V : x1 9 c E : x1 9 * n1 : x1 + x2 9 n n : x + x 9: N " x1 " x2 9 c V : x1 9 * 1 2 1 2 2 N N : N " 19 R Yik%lih,,d rati, :Upt,n; 1^_29 = t : x1 9 + t : x2 9 + t :1 " x1 9 + t :1 " x2 9 + t : N 9 : 8 c t : x9 (* x ln: x9 lr * 2;; 8 " " " + " " " t n t n t x x t N x x : 9 : 9 : 9 : 9 1 2 1 2 1 2 < 9 f t t%st with df = N-2 :D`Ag,stin, %t al; 1^__9 t N " 2 * : x1 :1 " x2 9 " x2 :1 " x1 99 N "2 N jn2 x1 :1 " x1 9 + n1 x2 :1 " x2 9i N,t%--- xi = succ%ss fr%qu%ncy in gr,up ic ni = sampl% siz% in gr,up ic N * n1 + n2 = t,tal sampl% siz%c )gi * xi E ni Tabl% _( T%st statistics us%d in t%sts ,f th% diff%r%nc% with ,ffs%t b%tw%%n tw, ind%p%nd%nt pr,p,rti,ns7 Nr Nam% )tatistic 1 z-t%st p,,l%d varianc% z* )g1 " )g 2 " % n )g + n )g c #g * )g :1 " )g 9>1 E n1 + 1 E n2 ? c )g * 1 1 2 2 #g n1 + n2 2 z-t%st p,,l%d varianc% with c,nt7c,rr7 z* 5" 1 l,w%r tail )g1 " )g 2 " % + k E 2>1 E n1 + 1 E n2 ? c #g s%% :19c k * 4 #g 3+ 1 upp%r tail Q z-t%st unp,,l%d varianc% z* )g1 " )g 2 " % c #g * )g1 :1 " )g1 9 E n1 + )g 2 :1 " )g 2 9 E n2 #g Z z-t%st unp,,l%d varianc% with c,nt7c,rr7 z* 5" 1 l,w%r tail )g1 " )g 2 " % + k E 2>1 E n1 + 1 E n2 ? c #g s%% :Q9c k * 4 #g 3+ 1 upp%r tail a t t%st with df = N-2 t N " 2 * :: x1 + % n1 9:1 " x2 9 " x2 :1 " x1 " % n1 99 K c :D`Ag,stin, %t al; K * : N " 29 ElN jn2 x1 :1 " x1 9 + n1 x2 :1 " x2 9ik 1^__9 R Yik%lih,,d sc,r% rati, :diff%r%nc%9 :8i%ttin%n d Nurmin%n; 1^_a9 Farringt,n d 8anning :1^^O9 eart d Nam :1^^O9 )g1 " )g 2 " % c #g * >)m1 :1 " )m1 9 E n1 + )m2 :1 " )m2 9 E n2 ?K #g 8i%ttin%n d Nurmin%n( K = NE:N-19; Farringt,n d 8anning( K =1 z* )m1 * 2u c,s:w9 " b E:Qa9 c )m2 * )m1 " % c @ * n2 E n1 c a * 1 + @ c b * ":1 + @ + )g1 + @)g 2 + % :@ + 299 c c * % 2 + % :2)g1 + @ + 19 + )g1 + @)g 2 c d * ")g1% :1 + % 9c v * bQ E:Qa 9Q " bc E:Ra 2 9 + d E: 2a 9 c w * :Q71Z1a^ + c,s "1 :v E u Q 99 E Q c u * sgn:v9 b 2 E:Qa9 2 " c E:Qa9 )k%wn%ss c,rr%ct%d z n :eart d Nam9c z acc,rding t, Farringt,n d 8anning( "1 z n * 1 + ZA :A + z 9 " 1 E 2A c V * >)m :1 " )m 9 E n + )m :1 " )m 9 E n ? c > A *V f Yik%lih,,d sc,r% rati, :risk rati,9 :8i%ttin%n d Nurmin%n; 1^_a9 Farringt,n d 8anning :1^^O9 eart d Nam :1^__9 _ Yik%lih,,d sc,r% rati, :,dds rati,9 z* ? 2EQ )g1 " )g 21 c #g * #g 1 1 2 2 2 >)m :1 " )m 9 E n + 1 )m :1 " )m 9 E n ?K 2 1 1 1 2 2 2 8i%ttin%n d Nurmin%n( K = NE:N-19; Farringt,n d 8anning( K =1 )m1 * 1)m2 c )m2 * =; " b " b 2 " Z N1 : x1 + x2 9 :8 E:2 N1 9c b * ":n1 + x2 91 " x1 " n2 c < 9 )k%wn%ss c,rr%ct%d z n :eart d Nam9c z acc,rding t, Farringt,n d 8anning( z n * 1 + ZA :A + z 9 " 1 E 2A c V * :1 " )m 9 E:)m n 9 + :1 " )m 9 E:)m n 9 c > A * 1 E:RV z* 2EQ > ? 1 1 1 2 2 2 9 :1 " )m1 9:1 " 2)m1 9 E: n1)m1 9 2 + :1 " )m2 9:1 " 2)m2 9 E: n2)m2 9 2 ? :)g1 " )m1 9 E:)m1 :1 " )m1 99 " :)g 2 " )m2 9 E:)m2 :1 " )m2 99 1 E:n )m :1 " )m 99 + 1 E: n )m :1 " )m 99 K 1 1 :8i%ttin%n d Nurmin%n; 1^_a9 1 E R>)m1 :1 " )m1 9:1 " 2)m1 9 E n1 + )m2 :1 " )m2 9:1 " 2)m2 9 E n2 ? 1 2 2 2 8i%ttin%n d Nurmin%n( K=NE:N-19; Farringt,n d 8anning( K =1 )m1 * )m2B E:1 + )m2 :B " 199 c )m2 * =; " b + b 2 + Za: x1 + x2 9 :8 E:2a9c < a * n2 :B " 19 c b * n1B + n2 " : x1 + x2 9:B " 19 9 N,t%---xi = succ%ss fr%qu%ncy in gr,up ic ni = sampl% siz% in gr,up ic N * n1 + n2 = t,tal sampl% siz%c )gi * xi E ni c %C*Cdiff%r%nc% b%tw%%n pr,p,rti,ns p,stulat%d in "0$ 1 * risk rati, p,stulat%d in HOc B * ,dds rati, p,stulat%d in HO Tabl% ^( T%sts f,r varianc%s Varianc%s T%st T%st Family Null Hyp,th%sis Eff%ct )iz% Diff%r%nc% fr,m c,nstant :,n% sampl% cas%9 !2 t%sts #2 Varianc% rati, c *1 r* #2 c Nth%r <aram%t%rs N,nc%ntrality <aram%t%r $ *O :H1( c%ntral !2 distributi,n; scal%d with r9 df * N " 1 Pn%quality ,f tw, varianc%s F t%sts # 22 *1 # 12 Varianc% rati, #2 r * 22 #1 $ *O :H1( c%ntral F distributi,n; scal%d with r9 df1 * n1 " 1 df 2 * n2 " 1
© Copyright 2026 Paperzz