Sampling “Sampling is a process of learning about the population on the basis of a sample drawn from it.” The process of sampling involves three elements: 1. Selecting the Sample 2. Collecting the information 3. Making an inference about the population. Essentials of Sampling: • Representative ness. • Adequacy • Independence • Homogeneity SAMPLING METHODS NON PROBABILITY SAMPLING METHODS JUDGEMENTAL PROBABILITY SAMPLING METHODS SIMPLE RANDOM STRATIFIED QUOTA SYSTEMATIC CONVENIENCE CLUSTER ASSOCIATION OF ATTRIBUTES An important aspect in the study and analysis of attributes is to find out the relationship between the attributes of any two or more variables, which is known as ‘Association of Attributes’. In common language, two attributes (A & B) are said to be Associated, if thy appear together in a number of cases. But in Statistics the word association has a specific meaning. Statistically two attributes are said to have an association, if they occur together in a larger number of cases than expected. According to Yule and Kendall: “ In Statistics, A and B are associated only if they appear together in a greater number of cases than it is expected if they are Independent.” TYPES OF ASSOCIATION OF ATTRIBUTES: 1. Positive Association: When two attributes are present or absent together in the data, they are said to be positively associated. 2. Negative Association: When two attributes have trend of being affected in different directions. 3. Independence of Attributes: When two attributes do not have tendency to be present together or presence of one does not cause absence of the other attribute, two attributes are regarded as independent METHODS OF DETERMING ASSOCIATION OF ATTRIBUTES I Comparison of observed and Expected Frequencies method II Method of Comparison of Proportions III Yule’s Coefficient of Association Method. Preparation of nine square table A /a B AB B B /b A A N Here A and B are two parameters a or = those parameters which are not A b or = those parameters which are not B Comparison Of Observed and Expected Frequencies Meth • If AB = A X B / N ( INDEPENDENT ASSOCIATION) • If AB > A X B / N ( POSITIVE ASSOCIATION) • If AB < A X B / N ( NEGATIVE ASSOCIATION) Example: Explain from the data given below, whether A and B are Independent, Positively or Negatively Associated? (i) N = 1000, A = 470, B = 620, AB = 320 (ii) A = 490, AB = 294, a = 570, ab = 380 (iii) AB = 256, aB = 768, Ab = 48, ab = 144 YULE’S COEFFICIENT OF ASSOCIATION: Q AB = AB X ab – Ab X aB AB X ab + Ab X aB INTERPRETATION OF THE COEFFICIENT: • If the value of Q AB is zero, there is no association betwe two attributes or the attributes are independent. • The mathematical sign ( + or -) with coefficient determi nature or direction of association. DEGREE OF ASSOCIATION Degree 1. Perfect Positive Negative +1 -1 2. Limited Very High High Moderate Low 3. Absence Between + .90 & +.99 Between + .75 & +.90 Between + .25 & +.75 Between + 0 & +.25 0 Between -.90 & -.99 Between -.75 & -.90 Between -.25 & -.75 Between 0 & -.25 0 Example 1: Prepare a nine square table from the following informatio calculate the Yule’s coefficient of Association. i) N = 1000, A = 400, B = 500, AB = 150 ii) A = 470, AB = 290, a = 530, aB = 310 Example 2: From the data given below, find out the association betwe darkness of eye colour of father and son. • • • • Father with dark eyes and sons with dark eyes = 50 Father with dark eyes and sons with light eyes = 79 Father with light eyes and sons with dark eyes = 89 Father with light eyes and son with light eyes = 782 What is a Hypothesis? A hypothesis is an assumption about the population parameter. – – A parameter is a characteristic of the population, like its mean or variance. The parameter must be identified before analysis. I assume the mean age of this class is 23yrs! The Null Hypothesis, H0 States the Assumption (numerical) to be tested Begin with the assumption that the null hypothesis is TRUE. (Similar to the notion of innocent until proven guilty) •The Null Hypothesis may or may not be rejected. The Alternative Hypothesis, H1 Is the opposite of the null hypothesis The Alternative Hypothesis may or may not be accepted Is generally the hypothesis that is believed to be true by the researcher Hypothesis Testing Process Assume the population mean age is 50. (Null Hypothesis) Is X 20 50? Population The Sample Mean Is 20 No, not likely! REJECT Null Hypothesis Sample HYPOTHESIS TESTING It begins with an assumption called hypothesis For e.g. If a coin is tossed 100 times, 59 Heads and 41 Tails were obtained The hypothesis may be that the coin is unbiased. PROCEDURE OF TESTING HYPOTHESIS: 1. 2. 3. 4. 5. Set up a hypothesis Set up a suitable significance level Setting a test criteria Doing Computations Making Decisions TYPES OF ERRORS: Accept Ho Ho is True Correct Decision Ho is False Type II Error Reject Ho Type I Error Correct Decision CHI SQUARE TEST Large Sample Test Sample Size > 30 Formula 2 = ∑ (O – E)2 E 2 = The value of chi square O = The observed value E = The expected value ∑ (O – E)2 = all the values of (O – E) squared then added together Degree of Freedom: (r-1)(c-1) or (n-1) Example 1: In an anti malarial campaign, in a certain district. Quinine was administered to 812 persons out of a total population of 3248. The number of fever cases is shown below: Treatme nt Fever No fever Total Quinine 20 220 792 2216 812 2436 240 3008 3248 No Quinine Total Discuss the usefulness of quinine in checking malaria. Expected frequency (E) = Row Total x Column Total N O 20 220 792 2216 E 60 180 752 2256 O-E 40 40 40 40 (O-E)2 (O-E)2 E 1600 26.67 1600 8.89 1600 2.13 1600 0.71 = 38.4 Degree of Freedom = (r-1) (c-1) Here r = 2 ( no. of rows) And c = 2 (no. of columns) Hence d.o.f = (2-1) x (2-1) = 1 At d.o.f = 1, The table value of 2 3.841 at 5% level i.e.. 38.4 > 3.841 Hence the hypothesis is rejected. Chi-Square Table 3.841 Example 2: 200 digits are chosen at random from a set of tables. The frequencies of the digits are as follows: Digit Frequency 0 18 1 19 2 23 3 21 4 16 5 25 6 22 7 20 8 21 9 15 Use Chi Square Test to assess the correctness of the hypothesis that the digits were distributed in equal numbers in the tables from which they were chosen. Small Sample Sample size< 30 t- test William Gosset ASSUMPTIONS OF T TEST: • The sample is drawn from a normal population • Sample of observations are independent • Sample size is small • Population variance is unknown. Test the significance of the mean in a random sample: In determining whether the mean of sample drawn from a normal population deviates significantly from a stated value (the hypothetical value of the population mean), when the variance of population is unknown. Formulae: t= X- X n S S= d2 n-1 Where X = the mean of sample = the actual or hypothetical mean of the population n = sample size S = std deviation of the sample. d = deviation from mean. Fiducial limits of the population: At 95% Significance level X ( S )t 0.05 n At 99% Significance level X ( S )t 0.01 n Example: In a random sample of 17 towns the mean population was 57000 and the standard deviation was 1600. Determine the confidence limits of the mean of the population at (a) 95% level of confidence (b) 99% level of confidence Example: Ten workers of a factory are selected at random. The number of units produced by them on a working day was as follows: 71, 72, 73, 75, 76, 77, 78, 79, 79, 80. On the basis of the given data is it reasonably correct to say that the mean number of units produced by them is 78 (For v = 9, t 0.05 = 2.262) II Testing the significance of difference between two sample means – small sample: In this it is assumed that the two samples are independent that is the value of observation in one sample does not depend on other. Formulae: t= X1 - X2 x n1n2 S n1 + n2 X1 = mean of the first sample X2 = mean of second sample n1 = number of observation in first sample n2 = number of observation in second sample S = combined std deviation (dev should be from actual mean) S = (X1 – X1)2 (X2 – X2)2 n1 + n2 - 2 D.f = n1 + n2 -2 Example: A random sample of 10 pigs was kept on food A the increase in their weight (kg) is as under- 10,6,16,17,13,12,8,14,15,9. Another randomly drawn sample of 12 pigs was kept on food B for the same period. The increase in their weight (kg) is as under7,13,22,15,12,14,18,8,21,23,10,17. Examine the significance of the difference between the increase in weights of pigs kept on food A and that of pigs kept on food B. (Value of t for degrees of freedom 20 at 5% level of significance is 2.09) F Distribution: The F test is named in honor of the great statistician Mr. R.A Fisher. The object of this test is to find out whether the two independent estimates of the population variance differ significantly, or whether the two samples may be regarded as drawn from the normal populations having the same variance. Formulae: F = Larger estimate of population variance Smaller estimate of population variance F = S12 S22 Where S12 >S22 S12= (X1 – X1)2 n1 – 1 S22= (X2 – X2)2 n2 - 1 Example: A random sample of 10 pigs was kept on food A the increase in their weight(kg) is as under10,6,16,17,13,12,8,14,15,9. Another randomly drawn sample of 12 pigs was kept on food B for the same period. The increase in their weight (kg) is as under7,13,22,15,12,14,18,8,21,23,10,17. Show that there is no significant difference in the estimates of population variance based on these two samples. ( 5% value of f at v1 = 11, v2 = 9 is 3.112) Example: The following table gives the number of units of an article produced daily by two laborers A and B. A: 40 30 38 41 38 35 B: 38 41 33 32 39 39 40 34 Can these results be treated as sufficient evidence that laborer B is more stable. Use F test. Example: Determine by applying F test whether it would be quite logical to assume that the variance in two blocks are equal.(For v1 = 7, v2 = 5, F.05 = 4.48) Block I Block II No. of plots 8 6 Mean 60 51 Sum square 50 40 Of deviations Fisher’s Z test: Prof Fisher has given the method of testing the significance of the correlation coefficient in small samples I Test the significance in a random sample: Formulae: (i) Zs = 1.1513 log 10 ( 1+r) ( 1-r) Z = 1.1513 log 10 ( 1+) ( 1-) (ii) Z= Zs –Z z r = coefficient of correlation = Population coefficient of correlation. z = 1 n-3 II Test the significance of the difference between two independent correlation coefficients: Formulae: (i) Z1 = 1.1513 log 10 ( 1+r) ( 1-r) Z2 = 1.1513 log 10 ( 1+r) ( 1-r) (ii) Z= Z1 –Z2 z1 - z2 z1 - z2 = 1 + n-3 Degree of freedom is n1 + n2 - 6 1 n - 3 Example (i) From a sample of 29 pairs of observations correlation coefficient of .72 was obtained. Is this significantly different from the population correlation coefficient of .8? (ii) The correlation of coefficient of 19 pairs of observations is 64 determine by z test whether it is significantly different from (a) 60 (b) 50? Example: The following data gives two sample sizes and correlation coefficients. Test the significance of the difference between two values at 5% level of significance using Fisher’s Z transformation. Sample Size Value of r I 23 .40 II 19 .65
© Copyright 2026 Paperzz