CHAPTER 4: GOODNESS OF FIT TESTING. PURPOSE: Up until now, we have made a decision about whether a distribution is clumped, uniform or stochastic in two ways: 1) we visually compared the observed frequencies to the expected frequencies and 2) we computed the CD and compared it to some value. In both of these cases, if differences are large we have no trouble believing the result, but, if the differences are small, we really aren’t sure. Now we will learn how to use a statistical test to help make a decision with some degree of confidence. We will use a statistical test, the Log-likelihood Ratio Goodness of Fit test (a.k.a. the G-test), to decide if differences between observed and expected are real. If we decide that the differences are not real, then we will conclude that the distribution in question is stochastic. If we do decide that the differences are real, we will compute the CD to tell us whether the distribution is either clumped or uniform. Hypotheses: The Null Hypothesis or Ho is: The observed frequency distribution FITS the expected frequency distribution with the given parameters. The Alternative Hypothesis or Ha is: : The observed frequency distribution DOES NOT FIT the expected frequency distribution with the given parameters. Interpretation of Hypotheses If you accept Ho, you conclude that the observed frequency distribution is stochastic. If you reject Ho and accept Ha, you conclude that the observed frequency distribution is either clumped or uniform; then you need to inspect the graph or the Coefficient of Dispersion to determine which. Computing the Test - Basic Steps. 1) First determine what your results would mean with respect to your original question. For example, what would it mean it the pattern was uniform or clumped or stochastic? 2) Determine which statistical error to avoid. There are two types of statistical errors that you can make with this test. You can accept Ho and conclude that the expected distribution fits the observed but it really doesn’t or you can reject Ho and conclude that the expected distribution does not fit the observed but it really does. Accepting Ho incorrectly is a Type II error. Rejecting Ho and accepting Ha incorrectly is a Type I error. The way to decide which error to avoid is to determine which error will cause the most harm. To determine which error to avoid, complete the following table. 4 -1 Table 4- 1: Selecting which statistical error to avoid. Error Avoidance Decision Matrix Statistical Conclusion Decision Accept Ho: Observed fits the expected Reject Ho: Observed does not fit the expected Action What if I’m wrong? Type of error II I If you decide to avoid Type I error, set alpha in tests of hypotheses to 0.025 If you decide to avoid Type II error, set alpha in tests of hypotheses to 0.050 3) Determine if the parameters are Extrinsic or Intrinsic. They are EXTRINSIC if the values are KNOWN (e.g. p and q for a coin flip are 0.5). They are INTRINSIC if they had to be estimated (ie. computed from the data). 4) Compute the appropriate Expected Frequency distribution. 5) For Poisson Only – Lump classes when an Expected Frequency <5. If the expected frequency is less than 5 for classes with values greater than 2.0, then adjacent classes must be grouped together before testing such that each class with a value greater than 2 should have an expected frequency of 5 or greater. 6) Compute degrees of freedom for the test. df=ν Table 4- 2: Determining degrees of freedom for the Goodness of Fit test. Parameters Lumping Yes No Extrinsic a*-2 a-1 *where a = number of classes after lumping 4-2 Intrinsic a-3 a-2 7) Compute the G-statistic G 2 * Observed Freq * ln(Observe d Freq/Expec ted Freq) 8) Compute Gadj a2 1 Gadj= G/ where 1 , n=total frequency (∑f ) and v=degrees of freedom 6n 9) Test the statistic. There are two ways to do this. Method a) is the oldest method but method b) is more useful for reporting results. Both give the same conclusion a) Look up the critical value in the Chi-Square table or in Excel and compare to Gadj i) If Gadj ≤ Х2 then ACCEPT Ho and conclude that the observed distribution is stochastic. ii) If Gadj > Х2 then REJECT Ho and compute the CD to determine whether the observed distribution is uniform or clumped. b) Use Excel to look up the probability and compare to Alpha i) If α ≤ P then ACCEPT Ho and conclude that the observed distribution is stochastic. ii) If α > P then REJECT Ho and compute the CD to determine whether the observed distribution is uniform or clumped. 10) If you reject Ho in Step 9, compute the CD and compare to the Expected CD to determine if the distribution is clumped or uniform. The expected CD for the Binomial is 1-p (or q) and the expected CD for the Poisson is 1. If the Computed CD is greater than the Expected CD, the distribution is clumped. Otherwise, the distribution is uniform 11) Draw conclusion EXAMPLE 1 Someone has published an article claiming that the environment can influence gender of offspring such that families would end up with either more males or more females than expected by chance. You suspect that the finding may be inaccurate so you set up a sampling scheme to find out. You randomly select 30 families that have four children. What is the measured variable? Female (yes or no) Is the sample size fixed? Yes, k=4 What does “p” represent in this experiment? The probability of having a female child by chance alone. 4 -3 Do you have to compute it from the data or do you already know the expected result? know that p=0.5. We already What does “q” represent in this experiment? The probability of the next child not being female. What is the value of q? 1-p=1-0.5= 0.5 How would you conduct the experiment? You would randomly select 30 families from families of four. Then you would record the gender of each sibling. 1) Determine what your results would mean with respect to your original question. What would it mean if the results showed a clumped distribution? It would mean that there was a higher probability of getting mostly males or mostly females than would be expected by chance alone; something was controlling the distribution of males and females. What would it mean if the results showed a uniform distribution? If the genders were distributed uniformly, it would mean that families tended to have the same distribution of genders and that families were too alike to be a function of chance. What would it mean if the observed frequencies were essentially the same as the expected frequencies? It would mean that there wasn’t any pattern to the distribution of females or males. 2) Determine Statistical Error to Avoid (alpha) Table 4- 3: Determining which statistical error to avoid - Example 1 - Binomial Conclusion Action What if I’m wrong? Accept Ho: Observed fits the expected There is no pattern to gender Incorrectly critized a fellow scientist’s work Reject Ho: Observed does not fit the expected There is a pattern to gender Publish a rebuttal to the previous author’s findings. Do nothing Lost out on a chance for recognition Type of error II I Conclude that the worse error is Type II so alpha (α) will equal 0.05. 3) Determine if the parameters are Extrinsic or Intrinsic If the number of females is stochastic, then p should be 0.5 because there should be an equal chance of obtaining a male or female. Since this is not computed from the data, the parameters are Extrinsic! Otherwise, you would have to estimate p. 4-4 4) Compute the appropriate Expected Frequency distribution Table 4- 4: Computing the expected frequencies for Example 1- Binomial. # of Females in Observed Family (Y) Frequency (f) 0 7 1 4 2 2 3 7 4 10 TOTAL 30 Probability equations q4 4pq3 6p2q2 4p3q p4 Probabilities 0.0625 0.2500 0.3750 0.2500 0.0625 1.0000 Expected Frequencies 1.9 7.5 11.3 7.5 1.9 30 5) Do any necessary lumping (for Poisson Only) where an Expected Frequency <5. Not necessary here as we are not using the Poisson 6) Compute the degrees of freedom for the test. This is Extrinsic with no lumping and a=5 so the degrees of freedom(ν) = a-1 = 4 7) Compute the G-statistic Table 4- 5: Computing the G-statistic for Example 1: Binomial. # of Females in Observed Family (Y) Frequency (f) 0 1 2 3 4 TOTAL 7 4 2 7 10 30 Expected Frequencies 1.9 7.5 11.3 7.5 1.9 30.1 2 *Observed Freq*ln(Observed Freq/Expected Freq). 18.257 -5.029 -6.927 -0.966 33.215 38.550 G = 39.550 8) Compute Gadj =1+ ( (a2-1)/6nν)) = 1+((52-1)/(6*30*4))=1.033 where ν = degrees of freedom Gadj = 39.550/1.033 =38.287 4 -5 9) Test the statistic. There are two ways to do this. Method a) is the oldest method but method b) is more useful for reporting results. Both give the same conclusion). a) Look up the critical value in the Chi-Square table or in Excel and compare to Gadj The critical value of the Chi-square distribution can be computed using the CHIINV(α, ν ) function in Excel. Enter =CHIINV(0.05,4) into a cell and you will get 9.488 (rounded to 3 decimal places). So Х2 (0.05,4)=9.488 Because Gadj (38.287) > Х2 (9.488) we REJECT Ho. b) Use Excel to look up the probability and compare to Alpha P (actual Type I error) can also be computed directly using the CHIDIST (X, ν ) function where X is the computed value (in this case Gadj). Enter =ROUND(CHIDIST(37.77,4),3) into a cell and you will get 0. P is never zero but it is a very small number in this case so you would report it as p<0.001. Because α (0.05) > p (<0.001) we REJECT Ho 10) If you reject Ho in Step 9, compute the CD and compare to the Expected CD to determine if the distribution is clumped or uniform. Table 4- 6: Computing the CD. # of Females in Observed Family (Y) Frequency (f) 0 7 1 4 2 2 3 7 4 10 TOTAL 30 Y 69 30 2.3 s2 f*Y2 f*Y 0 4 4 21 40 69 235 69 0 4 8 63 160 235 2 30 1 30 2.63 CD 8.10 2.3 1.144 Since this is a Binomial Distribution, we compare 1.144 to the Binomial Expected CD =1-p =1-0.5=0.5. Because 1.144>0.5, the distribution is clumped. 1) Draw conclusion. The distribution of female births in families of 4 is clumped (Gadj=38.287, ν = 4. p<0.001); there are more families with all females or all males than expected by chance. Therefore the published idea is correct. EXAMPLE 2: Suppose you are working for Fish and Game and you have been assigned to investigate road kills on Highway 26 south of Hollister, about 100 miles from San Jose. In this case, your biggest 4-6 priority is to try to see if there are specific areas where road kills are excessively high. If you identify problem areas, you will try to develop safe corridors (e.g. culverts) for the animals. However, you do have a tight budget so you must be sure that there really are problem areas before you authorize the construction of corridors. You have divided the 100 mile distance into 1mile segments (units of space) and count the number of roadkills on the route in each segment (you are measuring the abundance of road kills). What is the measured variable? Road Kills Is the sample size fixed? No 1) Determine what your results would mean with respect to your original question. What would it mean if the results showed a clumped distribution? If the results showed a clumped distribution, you would conclude that there are “hot spots” where animals are more likely to be killed and “safe spots” where animals are less likely to be killed than by chance alone. What would it mean if the results showed a uniform distribution? If the distribution is uniform, it would mean that virtually every segment has exactly the same mortality rate. This would be very strange and you would suspect some outside influence What would it mean if the observed frequencies were essentially the same as the expected frequencies? There is no pattern to the number of road kills per segment. 2) Determine Statistical Error to Avoid (alpha) Table 4- 7: Determining which statistical error to avoid - Example 2 - Poisson Statistical Decision Accept Ho: Observed fits the expected Reject Ho: Observed does not fit the expected Conclusion Action What if I’m wrong? There is no pattern Do not need to do any special control measures Need to develop safe corridors Animals are dying needlessly II Extra work and expense for nothing I There is a pattern Type of error Want to avoid a Type I error so alpha(α) = 0.025 3) Determine if the parameters are Extrinsic or Intrinsic Because the mean had to be determined from the data, the parameters are Intrinsic. 4 -7 4) Determine Expected Frequencies Table 4- 8: Determine expected frequencies for Example 2 - Poisson. # of kills per segment 0 1 2 3 4 5 ≥6 TOTAL Expected probabilities 0.124 0.259 0.270 0.188 0.098 0.041 0.020 1.000 Observed frequencies 13 26 26 18 10 5 2 100 Expected frequencies 12.4 25.9 27.0 18.8 9.8 4.1 2.0 100.0 5) Do any necessary lumping (for Poisson Only) where an Expected Frequency <5. Notice that the expected frequency for class ≥6 = 2. Because this is lower than 5, we must combine the observed and expected frequencies for class ≥6 with class 5 to make a new class ≥5 (Table 4-8). The new class, ≥5, now has an expected frequency of 6.1 which is fine. However, if the expected frequency had been less than 5, we would have to lump again. Table 4- 9: Lumping - combining observed and expected frequencies for classes that have an expected frequency less than 5. 5 ≥6 ≥5 0.041 0.020 becomes 0.061 5 2 4.1 2.0 7 6.1 6) Compute the degrees of freedom for the test. This is Intrinsic with lumping and a=6 so the degrees of freedom(ν) =a-3= 3 7) Compute the G-statistic Table 1: Computing the G-statistic for Example 2 - Poisson # of kills per segment 0 1 2 3 4 ≥5 TOTAL G = 0.23 4-8 Observed Frequency (f) 13 26 26 18 10 7 100 Expected Frequencies 12.4 25.9 27.0 18.8 9.8 6.1 100.0 2 *Observed Freq*ln(Observed Freq/Expected Freq). 1.294 0.300 -1.989 -1.603 0.336 1.895 0.233 8) Compute Gadj =1+ ( (a2-1)/6nν) = 1+((62-1)/(6*100*3))=1.019 where ν= degrees of freedom Gadj = 0.233/1.019 =0.228 9) Test the statistic. There are two ways to do this. Method a) is the oldest method but method b) is more useful for reporting results. Both give the same conclusion). a) Look up the critical value in the Chi-Square table or in Excel and compare to Gadj The critical value of the Chi-square distribution can be computed using the CHIINV(α, ν ) function in Excel. Enter =CHIINV(0.025,3) into a cell and you will get 9.348 (rounded to 3 decimal places). So Х2 (0.025,3)=9.348 Because Gadj (0.228) < Х2 (9.348) we ACCEPT Ho. b) Use Excel to look up the probability and compare to Alpha P (actual Type I error) can also be computed directly using the CHIDIST (X, ν ) function where X is the computed value (in this case Gadj). Enter =ROUND(CHIDIST(0.228,3),3) into a cell and you will get 0.973. Because α (0.05) < p (0.973) we ACCEPT Ho 10) If you reject Ho in Step 9, compute the CD and compare to the Expected CD to determine if the distribution is clumped or uniform. Because we did not reject Ho, we do not need to compute the CD. 11) Draw conclusion. There is no pattern (Gadj =0.228, ν =3, p=0.973) to the distribution of road Kills along a 100 mile stretch of Hwy 26 south of Hollister. Therefore it is not necessary to construct safe corridors. 4 -9 Name Pts Lab Section On your own PROBLEM #1 You are studying the distribution of a seastar, Pisaster ochraceus, which has two color morphs, Black and ochre. You want to know if color is adaptive and may provide some type of advantage in certain environments. You randomly sample 50 locations and record the color morph, black or not-black, of the first three seastars you encounter. Assume that, if a pattern exists, you will spend time and effort in trying to find out why. If not, you will just move on to a different problem: 1) What is Ho? What is Ha? . 2) What is the measured variable?. 3) Is the sample size fixed? If so, what is the value? 4) What is the appropriate theoretical probability distribution for this problem? 5) What would it mean if the results showed a clumped distribution? 6) What would it mean if the results showed a uniform distribution? 7) What would it mean if the observed frequencies were essentially the same as the expected frequencies? 4-1 0 8) Determine statistical error to avoid. Complete Table 1 Table 1: Select the statistical error to avoid for Problem 1 – Pisaster ochraceus color morph frequencies. Statistical Decision Conclusion Action What if I’m wrong? Accept Ho: Observed fits the expected Reject Ho: Observed does not fit the expected Type of error II I 9) Alpha (α) = 10) Are the parameters Intrinsic or Extrinsic? Data Table 2: Data for Problem 1- Color morph frequencies for Pisaster ochraceus. # of black morphs per quadrat (Y) 0 1 2 3 TOTAL Observed Frequency (f) 13 7 8 22 fY fY2 11) How many seastars did you examine? 12) How many of those were black? 13) What is the value for p? 14) What is the value for q? 4 -1 1 15) Compute the appropriate Expected Frequency distribution. Complete Table 3 Table 3: Compute expected frequencies for color morphs of Pisaster ochraceus. # of black morphs per quadrat (Y) Observed Frequency (f) 0 13 1 7 2 8 3 22 Probability equations Probabilities Expected Frequencies TOTAL 16) For Poisson Only – Lump classes when an Expected Frequency <5 17) Compute the degrees of freedom for the test. 18) Compute the G-statistic. Complete Table 4. Table 4: Compute the G statistic for Problem 1 - Pisaster ochraceus color morph data. # of black morphs per quadrat (Y) 0 1 2 3 TOTAL Observed Frequency (f) 13 7 8 22 G= __________________ 19) Compute q and Gadj = Gadj= 20) Test the statistic using method b). 4-1 2 Expected Frequencies 2 *Observed Freq*ln(Observed Freq/Expected Freq). 21) If you reject Ho, compute the CD and compare to the Expected CD to determine if the distribution is clumped or uniform. 20) Draw conclusion PROBLEM #2: Seeds are the main food source for Dipodomys deserti, the desert kangaroo rat. Because the seed distribution appears to have a clumped distribution, you expect that the rodents will also have a clumped distribution. If you find a pattern, you will try to find out why. You have counted rats in 335 quadrats selected randomly from a desert region where the rodents are found. 22) What is Ho? 23) What is the measured variable? 24) Is the sample size fixed? If so, what is the value? 25) What is the appropriate theoretical probability distribution for this problem? 26) What would it mean if the results showed a clumped distribution? 27) What would it mean if the results showed a uniform distribution? 28) What would it mean if the observed frequencies were essentially the same as the expected frequencies? 4 -1 3 29) Determine Statistical Error to Avoid (alpha). Complete Table 5 Table 10: Select the statistical error to avoid for Problem 1 – Pisaster ochraceus color morph frequencies. Statistical Decision Conclusion Action What if I’m wrong? Accept Ho: Observed fits the expected Reject Ho: Observed does not fit the expected I 30) Alpha (α) = 31) Determine if the parameters are Extrinsic or Intrinsic Data Table 11: Data for Problem 2- Frequency distribution for Dipodomys deserti. # of Rats per Quadrat (Y) 0 1 2 3 4 5 6 TOTAL 32) Mean = 33) Variance = 4-1 4 Type of error II Observed Frequency (f) 25 118 97 54 32 7 2 fY fY2 34) Compute the appropriate Expected Frequencies. Complete Table 7 Table 12: Compute expected frequencies for Problem 2 – distribution of Dipodomys deserti. # of Rats per Quadrat (Y) Observed Frequency (f) 0 25 1 118 2 97 3 54 4 32 5 7 6 2 Probability equations Probabilities Expected Frequencies TOTAL 35) For Poisson Only – Lump classes when an Expected Frequency <5.SHOW WORK in Table 8. Table 4- 13: Lump small frequencies for Problem 2 – distribution of Dipodomys deserti. # of Rats per Quadrat (Y) Observed Frequency (f) Probability equations Probabilities Expected Frequencies TOTAL 36) Compute the degrees of freedom for the test. 4 -1 5 37) Compute the G-statistic. Complete Table 9 Table 4- 14: Compute the G statistic for Problem 2 – distribution of Dipodomys deserti. # of Rats per Quadrat (Y) Observed Frequency (f) 0 1 2 3 4 5 6 25 118 97 54 32 7 2 TOTAL Expected Frequencies 2 *Observed Freq*ln(Observed Freq/Expected Freq). 48 94 91 58 28 11 5 -32.616 53.664 12.387 -7.718 8.546 -6.328 -3.665 24.270 G = ______________ 38) Compute Gadj = Gadj = 39) Test the statistic using method b). 40) If you reject Ho, compute the CD and compare to the Expected CD to determine if the distribution is clumped or uniform. 41) Draw conclusion 4-1 6
© Copyright 2024 Paperzz