CORRELATION AND REGRESSION. Introduction to Correlation Analysis The statistical methods discussed so far are primarily intended to describe a single variable i.e., univariate populations. In this chapter the techniques that are useful in studying the relationships that exist when the data on two or more variables is available, are discussed. If on the same individual, data on two variables say X and Y are listed, it is called a bivariate population. In this bivariate population, for every value of X, there is a corresponding value of Y. By treating these variables X and Y separately, measures of central tendency, dispersion etc., can be worked out. In addition to these measures it may be of interest to study the strength of relationship existing between the variables and the nature of their relationship. The study of the former aspect is referred to as ‘correlation’ and the latter as regression analysis. Measure of Simple Correlation It is a statistical tool to study the degree of association or relationship existing between two variables, when the relationship is linear or approximately linear. The degree of relationship is quantified by a coefficient called the ‘Karl Pearson’s product moment correlation coefficient or simply the ‘correlation coefficient’. It is denoted by r. The working formula for r is given by r n XY ( X )( Y ) n X 2 ( X ) 2 n Y 2 ( Y ) 2 In the above expression, X and Y denote the measurements on variables X and Y and n is the number of pairs of observations i.e. the sample size. Properties of the correlation coefficient: (i) It is a pure number without units or dimensions. (ii) It lies between -1 and 1 i.e., -1 ≤ r ≤ 1. (iii) The correlation coefficient is independent of the origin and the scale of measurement of the variables. The variables are said to be positively correlated if ‘r’ is positive and negatively correlated if ‘r’ is negative. Positive correlation indicates that two variables are moving in the same direction, i.e., as one increases the other increases or if one decreases the other decreases. Negative correlation indicates that the two variables are moving in opposite direction i.e., as one increases the other decreases Let r be the observed correlation coefficient in a sample of n pairs of observations from a bivariate normal population. To test the hypothesis Ho: = 0, i.e., population correlation coefficient is zero, the following test procedure is used: (i) Hypothesis Ho : =0 H1: ≠ 0 (ii) Test statistic Compute : t r n2 (1 r 2 ) Which is distributed as t with (n-2) df. (iii) Statistical decision If the calculated value of t is greater than the table value of t with n-2 degrees of freedom at the desired level of significance, the correlation between the variables is significant. However, it is to be noted that the significance of r is not an indication of the strength of relationship. It is simply a test to see whether is equal to zero or not. The degree of the relationship between two variables can be measured by the square of the correlation coefficient r (which is called the coefficient of determination). Unless r2 very high, one variable should not be used to predict the other. Introduction to Simple Linear Regression If two variables are found to be highly correlated then a more useful approach would be to study the nature of their relationship. Regression analysis achieves this by formulating statistical models which can best describe these relationships. These models enable prediction of the value of one variable, called the dependent variable from the known values of the other variable(s). It differs from correlation in that regression estimates the nature of relationship where as the correlation coefficient estimates the degree or intensity of relationship. Further, it is necessary to designate one of the variables as dependent and the other as independent in the case of regression analysis which is not necessary in correlation analysis.Simple linear regression deals with the study of near relationships involving two variables, whereas, the relationships among more than two variables are studied of multiple regression techniques. Estimation of parameters of regression equation Scatter diagram gives some idea of the nature of relationship existing between the variables. If it indicates that the relationship is linear in nature, next step would be to develop a statistical model and proceed to estimate the underlying relationship. It is assumed that linear relationship of the form, Y = a+bX+e In expression (1): .................................... (1) ‘e’ is a random variable (random error factor) assumed to be normally distributed with mean zero and variance σ2,i.e e ~ N (0, 2 ) . ’a’ and ‘b’ are constants (parameters). In this model it is assumed that each Yi is normally distributed with mean a+bXi and constant variance σ2. In the classical regression model it is further assumed that values of the independent variable X are fixed or are pre-selected by the researcher and the variable X is measured without error. Linear regression of Y on X Fitting linear relationship of the form (1) is equivalent to estimating the constants a and b for the observed data. The best method that is used of estimation of ‘a’ and ‘b’ is the method of ‘least squares’. In a popular way it only means that a line is found to which the total of squares of all distances from different points is minimum i.e. sum of e2 is minimum. In other words the values of a and b which minimize, n n e i (Yi a bX i ) 2 .................................... (2) i 1 2 i 1 In the above expression n stands for the number of pairs of observations. Estimates of parameters a and b which minimize (2) are obtained by the following formula: b n XY ( X )( Y ) n X 2 ( X ) 2 a Y bX Estimated values of these constants are substituted in the equation Y = a+ b X to get the regression equation. From this equation the value of Y can be estimated for a given value of X. Different Regression Lines Prior to this topic it was mentioned that independent variable X is fixed or is not a random variable. If both X and Y are random variables and are open to choice as to which affects which then the following regression lines may be conceived : (i) Regression equation of Y on X If Y is considered as dependent variable, then the regression equation of Y on X is given by Y = a+b X The regression coefficient b is called the regression coefficient of Y on X and is usually denoted by x. In this equation a and b are so estimated as to minimize the residual variation (deviations from regression) of Y i.e. ∑(Yi-a-bXi)2 is minimized. (ii) Regression equation of X on Y If X is considered as dependent variable then the regression equation is given by X = a 1+ b 1 Y The regression coefficient b1 is called the regression coefficient of X on Y and is usually denoted by bxy. In this equation a and b are so estimated as to minimize the residual variation of X i.e. ∑(Xi-a-bYi)2 is minimized. The values of ‘a’ and ‘b’ obtained in (i) and (ii) will usually be different. RANK ORDE CORRELATION The version of correlation seen earlier applies to those cases where the values of X and of Y are both measured on an equal- interval scale. It is also possible to apply the apparatus of linear correlation to cases where X and Y are measured on a merely ordinal scale. When applied to ordinal data, the measure of correlation is spoken of as the Spearman rank- order correlation coefficient, typically symbolized as rs. The Simple Formula for rs, for Rankings without Ties 6∑D2 rs = 1— N(N2—1) If this formula seems a bit odd to you, you are in good company. Generations of statistics students have been presented with it, and generations have puzzled over such mind- bending questions as: why do you start out with "1" and subtract something from it?; where does that N(N2—1) in the denominator come from?; and, above all, how does that peculiar "6" get into the numerator? Here are the answers to these age-old questions in a nutshell. For any set of N paired bivariate ranks, the minimum possible value of-∑D2 occurs in the case of perfect positive correlation. In this case, rank 1 for X is paired with rank 1 for Y, rank 2 for X with rank 2 for Y, and so on. Each value of D will accordingly be equal to zero, and so too will be the sum of the squared values of D. Conversely, the maximum possible value of-∑D2 occurs in the case of perfect negative correlation. This maximum possible value is in every instance equal to maximum-∑D = N(N2—1) 2 3 STATISTICAL INFERENCE Introduction Statistical inference is that branch of statistics which deals with the theory and techniques of making decisions regarding the statistical nature of the population using samples drawn from the population. Statistical inference has two branches. They are: (i) Theory Estimation and (ii) Testing of hypothesis. Basic Terminology 1. Population: It is the totality of individuals in a statistical investigation. For example in the study of knowing mean fish catch per boat in a landing centre, fish catches of all boats operating in the centre constitute statistical population. A study on the fish catches of all boats is called census survey. 2. Sample: A few representative selection of individuals form a population. For example in the study of knowing mean fish catch per boat in a landing centre, a few boats operating selected randomly form a population. A study on the fish catches of all selected boats is called sample survey. 3. Parameter: Population mean, population standard deviation, population size etc. which are quantitative characteristic of population are called parameters. 4. Statistic: Sample means, Sample standard deviations etc. which are function of sample values are called statistics. 5. Parameter space: A set of all the admissible values population parameters is called parameter space. 6. Sampling distribution: The distribution of values of a statistic for all different samples of the same size is called sampling distribution of the statistic. 7. Standard Error: Standard deviation of sampling distribution of the statistic’. 8. Estimator: Statistics which is used to estimate population parameter. 9. Estimate: Value or range of values used to estimate population / parameters. 10. Point Estimate: A single value used to estimate population parameter. 11. Interval estimates: A range of values used to estimate population parameter at a given level of confidence is called confidence interval. 12. Confidence coefficient: The probability that a confidence interval contains parameter is called Confidence coefficient. 13. Confidence limits: They are the limits of the confidence interval. 14. Estimation: It is the process of estimating population parameter. The table of Standard Error of some statistics: Statistics Mean Sample mean µ Difference of means µ1=µ2 Standard Error SE SE Sample Proportion P PQ n n 12 2 2 n 1 n2 where Q=1-P Difference of Proportions P1-P2 P1Q1 P2 Q2 n1 n2 P1-P2 (when P1=P2) 0 1 1 PQ n1 n2 Point Estimate: Sample mean is the point estimator of population mean. Sample proportion is the point estimator of population proportion. Utilities of Standard Error: Standard Error (S. E) is a measure variability of the statistic. It is useful in estimation and testing of hypothesis. 1. Standard Error is used to decide the efficiency and consistency of the statistic as an estimator. 2. In interval estimation, Standard Error is to write down the confidence intervals. 3. In testing of hypothesis, standard error of the test statistic is used to standardize the distribution of the test statistics. Unbiased Estimator: An estimator is said to be unbiased if the average of all values taken by the estimator is equal to the population parameter For example: Sample mean is an unbiased estimator population mean because average of all means of samples of same size taken from a population is equal to the population mean. Similarly, Sample proportion is the unbiased estimator of population proportion and Sample variance is the unbiased estimator of population variance. Note : Sample variance is given by s 2 ( xi x ) 2 n 1 Interval Estimators: CASE 1: (1- α ) % Interval estimates for population mean when σ is known is given by X Z . /2 n X Z . , X Z . /2 /2 = n n NOTE: If σ is unknown then, it is to be replaced by sample standard deviation ,‘s’ . The maximum error (E )given by Note: Larger the sample size( n), lower is the error(E). The sample size can be determined for the known confidence level by using the formula : Example A random sample of the sixteen Subjects has been taken from a locality gave the following Fasting sugar level measurements in mg/dl 95, 108, 97, 112, 99, 106, 105, 100, 99, 98, 104, 110, 107, 111, 103, 110. Assume that Sugar level follow a normal distribution with variance of 25 mg2/dl with unknown mean: 1. What is the mean? X 104 2. Determine 95%confidence interval at for the mean sugar level. X Z . , X Z . /2 /2 Confidence Interval is given by: n n For 95% → zα/2 = 1.96 5 5 104 1 . 96 . , 104 1 . 96 . 16 16 101.55,106.45 TESTING OF HYPOTHESES Introduction Many situations call for verification of statements on the basis of available information. For example, a researcher may be interested in verifying the statement 'Women are prone to cancer than men’ Verifying a statement concern in a population by examining a sample from that population is called testing of hypothesis. Estimation and testing of hypothesis are not as different as they appear. For example, confidence intervals may be used to arrive at the same conclusions that are reached by using the testing of hypothesis procedure. Terminology Before carrying out any statistical test, it is required to understand following terminology involved in testing of hypothesis. Statistical hypothesis Statistical hypothesis is a statement about the population under study. It is usually a statement about one or more parameters of the population. Such statement may or may not be true. Examples of hypothesis are: 1) Mean weight of newly born babies is 3 kg. 2) Drug A and B are equally effective in decreasing the sugar level Null hypothesis The hypothesis to be tested is commonly designated as ‘Null hypothesis” and is denoted usually by Ho. Alternative hypothesis Any admissible hypothesis that differs from a null Hypothesis is called an alternative hypothesis and is denoted by H1. Test statistic It is a function of sample values. It extracts the information about the population parameter contained in the sample. The observed value of the test statistic serves as a guide in rejecting or not rejecting the null hypothesis. For example in testing of null hypothesis that value of the population mean is µ o i.e. Ho : µ=µ0 the statistic used is (x o ) Z / n Level of significance In testing a given hypothesis, the maximum probability with which we would be willing to risk type I error is called the level of significance of the tests, denoted by α. In other words, it is a way of quantifying the amount of risk one wants to take in rejecting a true hypothesis. Usually 5% or 1% levels of significance are chosen. These levels, however, depend on the gravity of the risk which costs in decision making. To illustrate suppose 5% level of significance is chosen in designing a test of hypothesis, then there are about 5 chances in 100 that the hypothesis is rejected when it should be accepted, i.e. one is 95% confident about the right decision. The decision as to which values go into the rejection region and acceptance region is made on the basis of desired level of significance ‘α’. Tests of hypothesis are sometimes called tests of significance because of the term level of significance and the computed value of the test statistic that falls in the rejection region is said to be significant. One tailed and two-tailed tests If the null hypothesis H0 : µ = µ0 is tested against H1 : µ≠µ0 (which implies µ < µ0 or µ > µ0), then the interest is on extreme values of Z on both tails of the distribution. In such cases the critical region is split between two sides or tails of the distribution of the test statistic as shown in Figs. 1 and 2 above. Tests applied for such situations are called ‘two-tailed’ tests. If the null hypothesis H0 : µ = µ0, is tested against H1 : µ > µ0, then the interest is in the extreme value to one side of the mean Test for mean of single sample (Z Test) Let x1, x2 ..................xn be the values of a variable X, in a large random sample of size n from a population with mean m and variance σ2. On the basis of this sample, the hypothesis regarding the value µ is tested. Testing of hypothesis consists of the following steps. (i) Hypotheses Ho : H1 : µ = µo µ ≠ µo Where µo is a specified value (ii) Test statistic The following test statistic is computed Z ( x o ) / n Or (x o ) n Where x is the sample mean. (iii) Statistical decision If Z 1.96 , reject Ho at 5% level of significance, otherwise do not reject it. If Z 2.58 , reject Ho at 1% level of significance, otherwise do not reject it. Test for equality of two population means Testing single population proportion Testing hypothesis about population proportions is carried out in the same way as for means when the conditions necessary for using normal distribution are met with. When a sample is sufficiently large and when H0 : P = Po is true then test statistic is given by: Z p po po qo n ( p p0 ) n p0 q0 which is approximately distributed as a standard normal variate. Test of significant difference between two proportions Let x1 and x2 be the numbers of individuals possessing the given attribute say A in random samples of size n1 and n2 from the two populations respectively, then sample proportions are given by p1 x2 x1 and p 2 n2 n1 If P1 and P2 are proportions of populations 1 and 2 respectively then variance of (P1-P2) is given by ( P1 P2 ) 1 P1 Q1 P2 Q2 1 PQ n1 n2 n1 n2 Under the null hypothesis P1 = P2 = P (say) and Q1 = Q2 = Q. Steps in testing of hypothesis of equality of population proportions are as follows: (i) Hypotheses Ho: P1 = P2 = P (say) (ii) H1: P1 ≠ P2 ; Test statistic The following test statistic is used P1 P2 Z Given: 1 1 P Q n1 n2 x1 = 18, x2 = 25, n1 = 110, n2 = 140 p1 = 18/110 = 0.1636, p2 = 25/140 = 0.1786 P Hence, n1 p1 n2 p2 0.172 n1 n2 Z and 1- P 0.828 0.1636 0.1786 1 1 (0.172) (0.828) 110 140 0.015 (0.1424)(0.0162) 0.015 = -0.3123 0.04803 (iii) Statistical decision As Z 1.96 , the hypothesis is not rejected at 5% level of significance. Small samples Tests (n <30) The Student- t – distribution When the size of the sample is small, the distributions of various statistics are far from normality and hence tests of hypothesis based on normal variate cannot be applied. In such cases tests of hypothesis based or exact sampling distribution of ‘t’ and ‘F’ are applied. When applying these tests it is assumed that the population from which the sample is drawn is normal. The t - distribution which is popularly known as student’s t distribution is a sampling distribution derived from the parent normal distribution. This distribution is symmetrical about the mean but is slightly flatter than then normal distribution. Unlike the normal distribution it will be different for different size of the sample ‘n’ or the degree of freedom (n-1). When the size of the sample is very small < 30), the t - distribution markedly differs from normal distribution, but as n increases t - distribution resembles more and more a normal distribution (fig.1). The t distribution has mean zero and variance n/(n-2) for n>2. The variable t ranges theoretically from - to + . The values of ‘t’ have been tabulated for different degrees of ‘freedom at different levels of significance (Fisher and Yates, 1963). CASE 1:Test for Single Mean Let x1, x2 ...xn are a random sample of size n drawn from a normal population with mean µ. Let x and s2 denote mean and variance of the sample. To test the hypothesis 0 : 0 the following test procedure is used: (i) Hypotheses (ii) Test statistic 0 : 0 ; 1 : 0 t ( x o ) n s x 1 2 2 s s x n 1 n 2 Where This test statistic follows t -distribution with (n-1) degree of freedom. (iii) Statistical decision If t the table value of t at 5% level of significance, then reject Ho at 5% level of significance. Otherwise accept Ho If t the table value of t at 1% level of significance, then reject Ho at 1% level of significance. Otherwise accept Ho CASE 2: Testing of difference between two means (population variances assumed equal) Let x and s1 be the mean and standard deviation of a sample of size n1 from a normal population with mean µ1 and let y and s2 be the mean and standard deviation of another sample of size n2 from a normal population with mean µ2. To test whether the population means differ significantly, the following null hypothesis is set up: (i) Hypotheses 0 : 1 2 ; 1 : 1 2 (ii) Test statistic t ( x - y) 1 1 s n1 n 2 Which is distributed as t with n1 + n2 - 2 degrees of freedom. Then ‘s’ in the above expression is computed using the formula: s (x - x ) 2 (y - y ) 2 n1 n2 2 (n 1 - 1) s1 (n 2 - 1 ) s 2 n1 n2 2 2 2 (iii) Statistical decision If t > the table value of t at the specified level of significance, reject the null hypothesis at the level, otherwise accept it. CASE 3: Test of difference between two means of paired observations (paired t – test) When the two samples of equal size are drawn from two normal populations and these samples are not independent, then the paired t-test is used. Dependent samples arise, for instance, in experiments when an individual is tested first under one condition and then under another condition, so that there will be two observations for the same individual. Let n be the size of each of the two samples and d1, d2...................dn the difference between the corresponding members of the sample. Let d denote the mean of differences and s the standard deviation of these differences. (i) Hypotheses 0 : D 0; 1 : D 0 (ii) Test statistic To test this hypothesis, compute t d where d s n d n It is distributed as t with (n-1) degrees of freedom s 1 d 2 ( d ) 2 / n n 1 (iii) Statistical decision If t > the table value at the specified significance, reject Ho at the level. Exercise for practice: I. Samples of eleven and fifteen animals were fed on different diets A and B respectively. The gain in weight for the individual animals for the same period was as follows: Diet A(in gm): 20 25 30 23 29 19 30 12 19 27 26 Diet A(in gm): 39 29 19 17 41 28 23 27 31 28 31 18 16 24 26 II. Length measurements of sampled mackerel made on the same day in two landing centres are given bellow: Landing centre A: 24 21 20 19 17 18 15 13 20 Landing centre B: 21 22 14 18 16 19 20 22 21 Does the data support that mean length of mackerel in both the landing centres are same. Chi-square Test Introduction to Chi-square ( 2 ) distribution Theoretically, Chi-square (i.e. 2 ) distribution can be defined as the sum of squares of independent normal variates. If X1, X2 ………….Xn are n independent standard normal variates, then sum of squares of these variates X12 + X22 +…………………………+Xn2 follows the 2 distribution with n degrees of freedom. Alternatively if a sample of size n, is drawn from a normal population with variance σ2, the quantity (n-1) s2 / σ2 follows 2 distribution with (n-1) degrees of freedom where s2 is the sample variance. The shape of 2 distribution depends on the degrees of freedom which is also its mean (Fig. 9.5). When n is small, the 2 distribution is markedly different from normal distribution but as n increases the shape of the curve becomes more and more symmetrical and for n > 30, it can be approximated by a normal distribution. The values of 2 have been tabulated for different degrees of freedom at different levels of probability. (Fisher and Yates, 1963). The 2 is always greater than or equal to zero i.e. 2 ≥ 0. Test for fixed-ratio hypothesis Many investigations are carried out to verify empirically some biological phenomena that are expected to occur under some given assumptions. In common carps normal pigmentation is due to a dominant gene B — and its recessive allele bb produces blue pigmentation. F1 generation will have all individuals (Bb) with normal pigmentation. When F1’s are crossed to obtain F2 generation there will be common carps with normal and blue pigmentation in the ratio of 3: 1. Whether this hypothesis of 3: 1 ratio is substantiated by the actual observed data can be ascertained by 2 - test. This 2 test can be applied to test any fixed ratio hypothesis provided the expected ratio is specified before the investigation commences. If Oi refers to observed frequency and Ei refers to the expected frequency based on the expected ratio hypothesis, then 2 is computed as follows: (Oi Ei ) 2 i 1 Ei k 2 k 2 Oi n i 1 Ei ………………… (1) Where n is the total number of observations and k is the number of classes. The 2 in (1) has k-1 degrees of freedom. In this test the expected frequency of each class should be more than 5. If any such frequency is small adjacent classes may be grouped, so that the expected frequency is more than 5. If the calculated value of 2 is greater than the table value of 2 with (k-1) df at specified level of significance the null hypothesis of specified ratio is rejected. 2 test for independence of attributes in 2 x 2 contingency table - Suppose that an attribute data of size n is classified according to two attributes, say, A and B and the attribute A is further subdivided into two classes A1 and A2 and the attribute B into B1, and B2. Such attribute data can be presented in the form of a table called 2 x 2 contingency table as shown below: Table 2: 2x2 contingency table B\A A1 A2 Total B1 a b a+b B2 c d c+d Total a+c b+d N= a+b+c+d It may be of interest to test the independence of attributes A and B. H0 H1 : : The two attributes A and B are independent. The two attributes A and B are dependent. It is tested by the 2 test. A simple formula for computing X2 of a 2 x 2 contingency table is given by, N (ad bc) 2 (a b)(c d )(a c)(b d ) 2 Fisher’s Yates correction for continuity Where a, b, c and d are cell frequencies of 2 x 2 contingency table and N is the total frequency. This 2 has 1 degree of freedom. If, the expected cell frequencies are large, the discrete distribution of probabilities of all frequencies approximate to normal distribution. This approximation holds good fairly well when the degrees of freedom are more than 1 and the expected cell frequency in the various classes is not small. As the degrees of freedom of 2 statistic of 2 x 2 contingency table is 1, 2 approximation in this case will not be satisfactory and leads to over estimation of significance. This is corrected by the method suggested by Yates which is known as ‘Yates correction’. The correction consists of adding ½ to the observed minimum frequency and adjusting the other cell frequency for the observed marginal totals and then computing the 2 . Formula for 2 using the Yates correction in a 2x2 contingency table is given by, 2 n N ad bc 2 2 (a b)(c d )(a c)(b d ) This correction is suitable when the expected frequency of classes is less than 5, but estimation with correction can do no harm even when the frequencies are large. Hence it is always better to use the correction as a matter of routine. Example: In a series of experiments to test whether advanced stages of a particular infection is cured by ‘Lime treatment’ the following observations were found: Lime treated Untreated (control) Total Not cured Cured Total 86 14 100 88 12 100 174 26 200 Test whether lime has any effect in curing the infection Answer: (i) Ho H1 Hypotheses : There is no association between lime treatment and curing of infection. : There is association between lime treatment and curing of infection (ii) Test statistics N (ad bc) 2 (a b)(c d )( a c)(b d ) 2 Where a =86, b=14, c = 88, 200(86 x12 14 x88) 2 100 x100 x174 x 26 2 d = 12 200(200) 2 10000 x174 x 26 = 0.1768 (iii) Statistical decision Since 2 calculated (with and without Yates correction) is less than the table value (3.84 at 5%, 6.64 at 1%), H0 is not rejected. i.e. Infection is not dependant on Lime Curing. Computation of 2 in (r x c) contingency table The r x c contingency table is an extension of 2 x 2 contingency table in which the data are classified into ‘r’ rows and ‘c’ columns (table 3). In this table the frequencies which occupy cells of the table are called ‘cell frequencies’ whereas row and column totals are called the ‘marginal frequencies; Table 3: (r x c) contingency table A\B B1 B2 ... Bj ... Bc Total A1 O11 O12 ... O1j ... O1c A1 A2 . O21 O22 ... O2j ... O2c A2 ... ... ... ... ... ... ... Oi1 Oi2 ... Oij ... Oic Ai ... ... ... ... ... ... ... Ar Or1 Or2 ... Orj ... Orc Ar Total (B1) (B2) .. Bj .. BC N . . Ai . . . As the table consists of ‘r’ rows and ‘c’ columns, there will be (r x c) observed frequencies, one in each cell. Corresponding to each observed frequency, there is expected frequency, computed based on certain hypothesis. Under the null hypothesis of no relationship or of independence between the attributes, expected frequency of each cell is computed by multiplying totals of the row and column to which the cell belongs divided by the total number of observations. For instance, the expected frequency of the cell in 1st row and 2nd column is obtained by multiplying the 1st row total (A1) with the 2nd column total (B2) and then dividing by the total number of observations, ‘n’. After calculating the expected frequencies for each cell, 2 is computed using the formula, n 2 2 O i -n i 1 E i Which has (r-1) (c-1) degrees of freedom. F-Distribution Introduction In the t- tests on paired and unpaired samples we had to assume that the two samples came from the same Normal Distribution. We were testing that the means were the same but still had to assume that standard deviation were the same. We can test whether this assumption is correct by using F- test is due to Snedecor. If the standard deviations are the same then we would expect sample standard deviations to be similar i.e. s1=s2 or more precisely, This can be written as σ12/ σ22 =1 That is we can test for the closeness of this ratio to 1 The Variance ratio or F-Test formalizes these ideas: The ratio of variances (s12/ s22) of independent samples taken from a normal population follows a distribution called F-distribution with two degrees of freedom one for numerator and the other for denominator. F-distribution: If s12 is the variance of a sample of size n2 and s22 is the variance of another independent sample of n1 taken from the same population then ratio of variances of these samples i.e. s12/ s22 follows a distribution called F- distribution with (n1-1) and (n2-1) degrees of freedom. Note: 1. 2. 3. 4. F has two sets of degrees of freedom-one for numerator and another for denominator. F is a positively skewed distribution. Shape of F distribution depends upon the two degrees of freedom. Degrees of freedom are the parameters of F distribution. Testing for the equality of variances 1. Ho : σ12= σ22 (No significant difference between Variances) 2. H1: σ12≠ σ22 (Significant difference between Variances) 3. Fix α =0.05 Say 4. Test statistic: F= s12/ s22 where s12 is the larger sample variance Decision Rule: Reject Ho (i.e. No significant difference between Variances) if computed test statistic value is more than table F value for (n1-1); (n2-1) degrees of freedom otherwise accept Ho. Exercise for practice: I. Can the following two samples be considered to have the same underlying variance? Sample I : 18 29 24 16 17 21 22 19 20 Sample II : II. 17 19 20 24 21 22 18 19 23 18 20. Test for the equality standard deviation of the weight of both sex the from the following sampled data. Males : 56 59 48 54 39 57 62 69 Females: 45 49 57 53 51 Basic Ideas 59 61 71 57 The Purpose of Analysis of Variance In general, the purpose of analysis of variance (ANOVA) is to test for significant differences between means. Elementary Concepts provides a brief introduction to the basics of statistical significance testing. If we are only comparing two means, ANOVA will produce the same results as the t test for independent samples (if we are comparing two different groups of cases or observations) or the t test for dependent samples (if we are comparing two variables in one set of cases or observations). If you are not familiar with these tests, you may want to read Basic Statistics and Tables. Why the name analysis of variance? It may seem odd that a procedure that compares means is called analysis of variance. However, this name is derived from the fact that in order to test for statistical significance between means, we are actually comparing (i.e., analyzing) variances. Summary of the basic logic of ANOVA. To summarize the discussion up to this point, the purpose of analysis of variance is to test differences in means (for groups or variables) for statistical significance. This is accomplished by analyzing the variance, that is, by partitioning the total variance into the component that is due to true random error (i.e., within-group SS) and the components that are due to differences between means. These latter variance components are then tested for statistical significance, and, if significant, we reject the null hypothesis of no differences between means and accept the alternative hypothesis that the means (in the population) are different from each other. Dependent and independent variables. The variables that are measured (e.g., a test score) are called dependent variables. The variables that are manipulated or controlled (e.g., a teaching method or some other criterion used to divide observations into groups that are compared) are called factors or independent variables. For more information on this important distinction, refer to Elementary Concepts. A one-way layout consists of a single factor with several levels and multiple observations at each level. With this kind of layout we can calculate the mean of the observations within each level of our factor. The residuals will tell us about the variation within each level. We can also average the means of each level to obtain a grand mean. We can then look at the deviation of the mean of each level from the grand mean to understand something about the level effects. Finally, we can compare the variation within levels to the variation across levels. Hence the name analysis of variance. Model The equation indicates that the j th data value, from level i, is the sum of three components: the common value (grand mean), the level effect (the deviation of each level mean from the grand mean), and the residual (what's left over). Important Assumptions in ANOVA The ANOVA method has some very important assumptions. First, as with the t tests, there is an assumption of normality--the groups must be normally distributed on the "dependent" variable. The ANOVA method is relatively robust to violations of this assumption, provided the violations are not too severe. The best way to assess deviation from normality is to simply examine a histogram. The issue of normality has been covered well in previous courses. A more vexing problem in practice is the assumption of equal variances. There is often evidence that this assumption is violated (remember the standard deviation squared is the variance, so large differences in standard deviations among groups is evidence of a problem). In fact, the major statistical software packages provide tests for this assumption. The most common are the Bartlett-Box test, Hartley's test, and Levene's test. Calculating these statistics is beyond the scope of the course, but interpretation is relatively straight-forward. They are available in the major statistical packages. For these tests, the null hypothesis is that variances of the individual cells are equal. When the null is rejected, the variances are unequal, and the assumption is not met for ANOVA. However, these tests tend to be very powerful, and hence it is probably not cause for real concern until the null is rejected at less than the .001 level (sig or p < .001), especially when sample size is large. When variances among groups are unequal, they are referred to as heterogenous, and it is said that the homogeneity of variance assumption is violated. When this is the case, there are ways of correcting the analysis. We will now consider two approaches for testing the omnibus null when the homogeneity of variance assumption is untenable. In both of these tests, modifications of the calculation of MSbg and/or MSwg are made, and, most importantly, the critical values of F are adjusted by adjusting the df for MSwg. These methods are mathematically tedious, but not incomprehensible (make sure you understand the difference- "tedious" would take a whole day to do and would make you bored or angry, "incomprehensible" is where you couldn't do it even if you had the time). Make sure you know what you would have to do to perform these methods, even though you do not plan to (there are ways of testing to see if you have done this!). There is nothing in the formulas below you have not seen before. It is just n, N, s (standard deviation), a (number of groups), etc. The first method is the Brown-Forsythe method (Brown & Forsythe, 1974). In this method, the MSwg is modified to yield a special F statistic (F*). The F* value is then evaluated at a special denominator df value (df*). Example Problem: Susan Sound predicts that students will learn most effectively with a constant background sound, as opposed to an unpredictable sound or no sound at all. She randomly divides twenty-four students into three groups of eight. All students study a passage of text for 30 minutes. Those in group 1 study with background sound at a constant volume in the background. Those in group 2 study with noise that changes volume periodically. Those in group 3 study with no sound at all. After studying, all students take a 10 point multiple choice test over the material. Their scores follow: group 1) constant sound 7 4 6 8 6 6 2 9 2) random sound 5 5 3 4 4 7 2 2 3) no sound 2 4 7 1 2 1 5 5 x1 7 4 6 8 6 6 2 9 1 test scores = 48 x12 49 16 36 64 36 36 4 81 2 1 x2 5 5 3 4 4 7 2 2 = 322 2 1) 2 2) = 2304 M1 = 6 SStotal = 117.96 = 507.13 - 477.04 SSamong = 30.08 SSwithin = 117.96 - 30.08 = 87.88 df MS = 32 = 1024 M2 = 4 = 595 - 477.04 Source SS 2 F x22 25 25 9 16 16 49 4 4 2 2 = 148 x3 2 4 7 1 2 1 5 5 3 = 27 2 3) = 729 M3 = 3.375 x32 4 16 49 1 4 1 25 25 2 3 = 125 Among 30.08 2 15.04 3.59 Within 87.88 21 4.18 *(according to the F sig/probability table with df = (2,21) F must be at least 3.4668 to reach p < .05, so F score is statistically significant) Interpretation: Susan can conclude that her hypothesis may be supported. The means are as she predicted, in that the constant music group has the highest score. However, the signficant F only indicates that at least two means are signficantly different from one another, but she can't know which specific mean pairs significantly differ until she conducts a post-hoc analysis (e.g., Tukey's HSD). Nonparametric tests Occasionally, the assumptions of the t-tests are seriously violated. In particular, if the type of data you have is ordinal in nature and not at least interval. On such occasions an alternative approach is to use nonparametric tests. Nonparametric tests are also referred to as distribution-free tests. These tests have the obvious advantage of not requiring the assumption of normality or the assumption of homogeneity of variance. They compare medians rather than means and, as a result, if the data have one or two outliers, their influence is negated. Parametric tests are preferred because, in general, for the same number of observations, they are more likely to lead to the rejection of a false hull hypothesis. That is, parametric tests have more power. This greater power stems from the fact that if the data have been collected at an interval or ratio level, information is lost in the conversion to ranked data (i.e., merely ordering the data from the lowest to the highest value). There are a wide range of alternatives for the two groups t-tests and are given in the table below. Parametric test Non-parametric Paired sample t-test Wilcoxon T Test Independent samples t-test Mann-Whitney U Test Pearson's correlation Spearman's correlation One-way ANOVA Kruskal–Wallis one-way ANOVA The Wilcoxon signed-rank test is a non-parametric statistical hypothesis test used when comparing two related samples, matched samples, or repeated measurements on a single sample to assess whether their population mean ranks differ (i.e. it's a paired difference test). It can be used as an alternative to the paired Student's t-test, t-test for matched pairs, or the t-test for dependent samples when the population cannot be assumed to be normally distributed. The Wilcoxon signed ranks test can be performed on small samples and large samples. When performing these tests, there are eight steps that should be used: 1) State the null and research hypotheses. 2) Set the level of risk (or the level of significance) associated with the null hypothesis. 3) Choose the appropriate test statistic. 4) Compute the test statistic. 5) Determine the value needed for rejection of the null hypothesis using the appropriate table of critical values for the particular statistic. 6) Compare the obtained value to the critical value. 7) Interpret the results. 8) Report the results. It is now suggested that the confidence interval should be reported when testing research data. A confidence interval is an inference to a population in terms of an estimation of sampling error, providing a range with a level of confidence of 100(1-alpha)%. Assumptions 1. Data is paired and comes from the same population. 2. Each pair is chosen randomly and independent. 3. The data is measured on an interval scale (ordinal is not sufficient because we take differences), but need not be normal. Test procedure Let N be the sample size, the number of pairs. Thus, there are a total of 2N data points. For i = 1, ..., N, let and denote the measurements. . 1. For i = 1... N, calculate function. and 2. Exclude pairs with . Let 3. Order the remaining difference, , where sgn is the sign be the reduced sample size. pairs from smallest absolute difference to largest absolute . 4. Rank the pairs, starting with the smallest as 1. Ties receive a rank equal to the average of the ranks they span. Let denote the rank. 5. Calculate the test statistic W. , the absolute value of the sum of the signed ranks. 6. As increases, the sampling distribution of W converges to a normal distribution. Thus, For , a z-score can be calculated as . If z > z critical, reject H0. For , is compared to a critical value from a reference table[1]. If , reject H0. Alternatively, a p-value can be calculated from enumeration of all possible combinations of given . [edit]Example 1 125 110 1 15 5 140 140 0 2 115 122 –1 7 3 130 125 1 5 1.5 1.5 5 1.5 1.5 order by absolute difference 3 130 125 1 5 9 140 135 1 4 140 120 1 20 2 115 122 –1 7 3 –3 5 140 140 0 6 115 124 –1 9 4 –4 6 115 124 9 10 135 145 –1 10 5 –5 –1 12 6 –6 1 125 110 1 15 7 7 5 7 140 123 1 17 8 8 10 4 140 120 1 20 9 9 7 140 123 1 17 8 125 137 8 125 137 –1 12 9 140 135 1 10 135 145 –1 –1 is the sign function, is the absolute value, and is the rank. Notice that pairs 3 and 9 are tied in absolute value. They would be ranked 1 and 2, so each gets the average of those ranks, 1.5. Mann–Whitney U In statistics, the Mann–Whitney U test (also called the Mann–Whitney–Wilcoxon(MWW) or Wilcoxon rank-sum test) is a non-parametric statistical hypothesis test for assessing whether one of two samples of independent observations tends to have larger values than the other. It is one of the most well-known non-parametric significance tests. It was proposed initially[1] by the German Gustav Deuchler in 1914 (with a missing term in the variance) and later independently by Frank Wilcoxon in 1945,[2] for equal sample sizes, and extended to arbitrary sample sizes and in other ways by Henry Mann and his student Donald Ransom Whitney in 1947.[3] Assumptions and formal statement of hypotheses Although Mann and Whitney developed the MWW test under the assumption of continuous responses with the alternative hypothesis being that one distribution is stochastically greater than the other, there are many other ways to formulate the null and alternative hypotheses such that the MWW test will give a valid test. A very general formulation is to assume that: 1. All the observations from both groups are independent of each other, 2. The responses are ordinal (i.e. one can at least say, of any two observations, which is the greater), 3. Under the null hypothesis the distributions of both groups are equal, so that the probability of an observation from one population (X) exceeding an observation from the second population (Y) equals the probability of an observation from Y exceeding an observation from X, that is, there is a symmetry between populations with respect to probability of random drawing of a larger observation. 4. Under the alternative hypothesis the probability of an observation from one population (X) exceeding an observation from the second population (Y) (after exclusion of ties) is not equal to 0.5. The alternative may also be stated in terms of a one-sided test, for example: P(X > Y) + 0.5 P(X = Y) > 0.5. Under more strict assumptions than those above, e.g., if the responses are assumed to be continuous and the alternative is restricted to a shift in location (i.e. F1(x) = F2(x + δ)), we can interpret a significant MWW test as showing a difference in medians. Under this location shift assumption, we can also interpret the MWW as assessing whether the Hodges–Lehmann estimate of the difference in central tendency between the two populations differs from zero. The Hodges–Lehmann estimate for this two-sample problem is themedian of all possible differences between an observation in the first sample and an observation in the second sample. Calculations The test involves the calculation of a statistic, usually called U, whose distribution under the null hypothesis is known. In the case of small samples, the distribution is tabulated, but for sample sizes above ~20 there is a good approximation using the normal distribution. Some books tabulate statistics equivalent to U, such as the sum of ranks in one of the samples, rather than U itself. The U test is included in most modern statistical packages. It is also easily calculated by hand, especially for small samples. There are two ways of doing this. First, arrange all the observations into a single ranked series. That is, rank all the observations without regard to which sample they are in. Method one: For small samples a direct method is recommended. It is very quick, and gives an insight into the meaning of the U statistic. 1. Choose the sample for which the ranks seem to be smaller (The only reason to do this is to make computation easier). Call this "sample 1," and call the other sample "sample 2." 2. Taking each observation in sample 1, count the number of observations in sample 2 that have a smaller rank (count a half for any that are equal to it). The sum of these counts is U. Method two: For larger samples, a formula can be used: 1. Add up the ranks for the observations which came from sample 1. The sum of ranks in sample 2 follows by calculation, since the sum of all the ranks equals N (N + 1)/2 where N is the total number of observations. 2. U is then given by: where n1 is the sample size for sample 1, and R1 is the sum of the ranks in sample 1. Note that there is no specification as to which sample is considered sample 1. An equally valid formula for U is The smaller value of U1 and U2 is the one used when consulting significance tables. The sum of the two values is given by Knowing that R1 + R2 = N (N + 1)/2 and N = n1 + n2 , and doing some algebra, we find that the sum is Properties The maximum value of U is the product of the sample sizes for the two samples. In such a case, the "other" U would be 0. For large sample size where mU and σU are the mean and standard deviation of U, is approximately a standard normal deviate whose significance can be checked in tables of the normal distribution. mU and σU are given by The formula for the standard deviation is more complicated in the presence of tied ranks; the full formula is given in the text books referenced below. However, if the number of ties is small (and especially if there are no large tie bands) ties can be ignored when doing calculations by hand. The computer statistical packages will use the correctly adjusted formula as a matter of routine. Note that since U1 + U2 = n1 n2, the mean n1 n2/2 used in the normal approximation is the mean of the two values of U. Therefore, the absolute value of the z statistic calculated will be same whichever value of U is used. Kruskal–Wallis one-way analysis of variance Kruskal-Wallis one-way analysis of variance) In statistics, the Kruskal–Wallis one-way analysis of variance by ranks (named after William Kruskal and W. Allen Wallis) is a non-parametric method for testing whether samples originate from the same distribution. It is used for comparing more than two samples that are independent, or not related. The parametric equivalence of the Kruskal-Wallis test is the one-way analysis of variance (ANOVA). The factual null hypothesis is that the populations from which the samples originate have the same median. When the Kruskal-Wallis test leads to significant results, then at least one of the samples is different from the other samples. The test does not identify where the differences occur or how many differences actually occur. It is an extension of the Mann– Whitney U test to 3 or more groups.The Mann-Whitney would help analyze the specific sample pairs for significant differences. Since it is a non-parametric method, the Kruskal–Wallis test does not assume a normal distribution, unlike the analogous one-way analysis of variance. However, the test does assume an identically shaped and scaled distribution for each group, except for any difference in medians. Method 1. Rank all data from all groups together; i.e., rank the data from 1 to N ignoring group membership. Assign any tied values the average of the ranks they would have received had they not been tied. 2. The test statistic is given by: where: is the number of observations in group is the rank (among all observations) of observation is the total number of observations across all groups , is the average of all the 3. Notice that the denominator of the expression for exactly and from group . is . Thus Notice that the last formula only contains the squares of the average ranks. 4. A correction for ties can be made by dividing by , where G is the number of groupings of different tied ranks, and ti is the number of tied values within group i that are tied at a particular value. This correction usually makes little difference in the value of K unless there are a large number of ties. 5. Finally, the p-value is approximated by . If some values are small (i.e., less than 5) the probability distribution of K can be quite different from this chisquared distribution. If a table of the chi-squared probability distribution is available, the critical value of chi-squared, , can be found by entering the table at g − 1 degrees of freedom and looking under the desired significance or alpha level. The null hypothesis of equal population medians would then be rejected if . Appropriate multiple comparisons would then be performed on the group medians. 6. If the statistic is not significant, then no differences exist between the samples. However, if the test is significant then a difference exists between at least two of the samples. Therefore, a researcher might use sample contrasts between individual sample pairs, or post hoc tests, to determine which of the sample pairs are significantly different. When performing multiple sample contrasts, the Type I error rate tends to become inflated. Cohen's kappa (Kappa Statistics) Cohen's kappa coefficient is a statistical measure of inter-rater agreement or inter-annotator agreement for qualitative (categorical) items. It is generally thought to be a more robust measure than simple percent agreement calculation since κ takes into account the agreement occurring by chance. Some researchers have expressed concern over κ's tendency to take the observed categories' frequencies as givens, which can have the effect of underestimating agreement for a category that is also commonly used; for this reason, κ is considered an overly conservative measure of agreement. Others contest the assertion that kappa "takes into account" chance agreement. To do this effectively would require an explicit model of how chance affects rater decisions. The so-called chance adjustment of kappa statistics supposes that, when not completely certain, raters simply guess—a very unrealistic scenario. Calculation Cohen's kappa measures the agreement between two raters who each classify N items into C mutually exclusive categories. The first mention of a kappa-like statistic is attributed to Galton (1892), see Smeeton (1985). The equation for κ is: where Pr(a) is the relative observed agreement among raters, and Pr(e) is the hypothetical probability of chance agreement, using the observed data to calculate the probabilities of each observer randomly saying each category. If the raters are in complete agreement then κ = 1. If there is no agreement among the raters other than what would be expected by chance (as defined by Pr (e)), κ = 0. Example Suppose that you were analyzing data related to people applying for a grant. Each grant proposal was read by two people and each reader either said "Yes" or "No" to the proposal. Suppose the data were as follows, where rows are reader A and columns are reader B: B B Yes No A Yes 20 5 A No 10 15 Note that there were 20 proposals that were granted by both reader A and reader B and 15 proposals that were rejected by both readers. Thus, the observed percentage agreement is Pr (a) = (20 + 15) / 50 = 0.70 To calculate Pr (e) (the probability of random agreement) we note that: Reader A said "Yes" to 25 applicants and "No" to 25 applicants. Thus reader A said "Yes" 50% of the time. Reader B said "Yes" to 30 applicants and "No" to 20 applicants. Thus reader B said "Yes" 60% of the time. Therefore the probability that both of them would say "Yes" randomly is 0.50 * 0.60 = 0.30 and the probability that both of them would say "No" is 0.50 * 0.40 = 0.20. Thus the overall probability of random agreement is Pr (e) = 0.3 + 0.2 = 0.5. So now applying our formula for Cohen's Kappa we get: Same percentages but different numbers A case sometimes considered to be a problem with Cohen's Kappa occurs when comparing the Kappa calculated for two pairs of raters with the two raters in each pair having the same percentage agreement but one pair give a similar number of ratings while the other pair give a very different number of ratings.[6] For instance, in the following two cases there is equal agreement between A and B (60 out of 100 in both cases) so we would expect the relative values of Cohen's Kappa to reflect this. However, calculating Cohen's Kappa for each: Case I. Rater I Rater II Yes No Yes 45 15 No 25 15 Case II Yes No Yes 25 35 No 5 35 we find that it shows greater similarity between A and B in the second case, compared to the first. Significance and magnitude Statistical significance makes no claim on how important is the magnitude in a given application or what is considered as high or low agreement. Statistical significance for kappa is rarely reported, probably because even relatively low values of kappa can nonetheless be significantly different from zero but not of sufficient magnitude to satisfy investigators. Still, its standard error has been described and is computed by various computer programs. If statistical significance is not a useful guide, what magnitude of kappa reflects adequate agreement? Guidelines would be helpful, but factors other than agreement can influence its magnitude, which makes interpretation of a given magnitude problematic. As Sim and Wright noted, two important factors are prevalence (are the codes equiprobable or do their probabilities vary) and bias (are the marginal probabilities for the two observers similar or different). Other things being equal, kappas are higher when codes are equiprobable and distributed similarly by the two observers. Another factor is the number of codes. As number of codes increases, kappas become higher. Based on a simulation study, Bakeman and colleagues concluded that for fallible observers, values for kappa were lower when codes were fewer. And, in agreement with Sim & Wrights's statement concerning prevalence, kappas were higher when codes were roughly equiprobable. Thus Bakeman et al. concluded that "no one value of kappa can be regarded as universally acceptable." They also provide a computer program that lets users compute values for kappa specifying number of codes, their probability, and observer accuracy. For example, given equiprobable codes and observers who are 85% accurate, value of kappa are .49, .60, .66, and .69 when number of codes is 2, 3, 5, and 10, respectively. Nonetheless, magnitude guidelines have appeared in the literature. Perhaps the first was Landis and Koch, who characterized values < 0 as indicating no agreement and 0–.20 as slight, .21–.40 as fair, .41–.60 as moderate, .61–.80 as substantial, and .81–1 as almost perfect agreement. This set of guidelines is however by no means universally accepted; Landis and Koch supplied no evidence to support it, basing it instead on personal opinion. It has been noted that these guidelines may be more harmful than helpful. Fleiss's equally arbitrary guidelines characterize kappas over .75 as excellent, .40 to .75 as fair to good, and below .40 as poor. Intraclass correlation Coefficient (ICC) Intraclass Correlation Coefficient (ICC) is a general measurement of agreement or consensus, where the measurements used are assumed to be parametric (continuous and has a Normal distribution). The Coefficient represents agreements between two or more raters or evaluation methods on the same set of subjects. ICC has advantages over correlation coefficient, in that it is adjusted for the effects of the scale of measurements, and that it will represent agreements from more than two raters. In statistics, the intraclass correlation (or the intraclass correlation coefficient, abbreviated ICC) is a descriptive statistic that can be used when quantitative measurements are made on units that are organized into groups. It describes how strongly units in the same group resemble each other. While it is viewed as a type of correlation, unlike most other correlation measures it operates on data structured as groups, rather than data structured as paired observations. The intraclass correlation is commonly used to quantify the degree to which individuals with a fixed degree of relatedness (e.g. full siblings) resemble each other in terms of a quantitative trait (see heritability). Another prominent application is the assessment of consistency or reproducibility of quantitative measurements made by different observers measuring the same quantity. Interpretation of results Initially, a two way analysis of variance is carried out, and if the between column mean sum squares (between methods of measurements or raters) is not significantly greater than the residual error, then a conclusion that concordance exists can be made. The intraclass correlation coefficient then provides a scalar measure of agreement or concordance between all the methods. The value 1 represent perfect agreement, and 0 as no agreement at all. In consideration of the ICC results in terms of models. Model 1 assumes that each method of measurement or rater is different, being subsets of a larger set of methods or raters, randomly chosen. This would be the case of having different research assistants measuring the height of children, with different sets of people doing the measurements at different sites, then combining the data together. Model 2 assumes the same methods or raters performs the evaluations in all cases, although these methods or raters may be a subset of a larger set of methods or raters. This is the model most frequently encountered in clinical research, when the same researchers carried out measurements on all the subjects. Model 3 makes no assumptions about the methods or raters. In consideration of single or mean: If each value (score or measurement) is obtained from an individual, then the single or individual form is used. This is the most common situation encountered in clinical research. If each value (score or measurement) is the mean of multiple measurements (for example, each measurement is the mean of repeated measurements of a sample or a rating score is the mean of that obtained by a team of raters), then the mean form should be used. In most cases, therefore, Model 2 individual is usually used ICC can be interpreted as follows: 0-0.2 indicates poor agreement: 0.3-0.4 indicates fair agreement; 0.5-0.6 indicates moderate agreement; 0.7-0.8 indicates strong agreement; 0.8-1.0 indicates almost perfect agreement. The Intraclass Correlation (ICC) assesses rating reliability by comparing the variability of different ratings of the same subject to the total variation across all ratings and all subjects. The theoretical formula for the ICC is: 2(b) ICC = [1] 2(b) + 2 (w) where 2(w) is the pooled variance within subjects, and 2(b) is the variance of the trait between subjects. It is easily shown that 2(b) + 2(w) = the total variance of ratings--i.e., the variance for all ratings, regardless of whether they are for the same subject or not. Hence the interpretation of the ICC as the proportion of total variance accounted for by within-subject variation. Equation [1] above would apply if we knew the true values, 2 (w) and 2(b). But we rarely do, and must instead estimate them from sample data.
© Copyright 2026 Paperzz