introduction to hypothesis and tests unit 5 (2014)

Sampling
“Sampling is a process of learning about the
population on the basis of a sample drawn
from it.”
The process of sampling involves three elements:
1. Selecting the Sample
2. Collecting the information
3. Making an inference about the population.
Essentials of Sampling:
• Representative ness.
• Adequacy
• Independence
• Homogeneity
SAMPLING METHODS
NON PROBABILITY
SAMPLING METHODS
JUDGEMENTAL
PROBABILITY
SAMPLING METHODS
SIMPLE RANDOM
STRATIFIED
QUOTA
SYSTEMATIC
CONVENIENCE
CLUSTER
ASSOCIATION OF ATTRIBUTES
An important aspect in the study and analysis of attributes
is to find out the relationship between the attributes of any two
or more variables, which is known as ‘Association of Attributes’.
In common language, two attributes (A & B) are said to be
Associated, if thy appear together in a number of cases. But in
Statistics the word association has a specific meaning. Statistically
two attributes are said to have an association, if they occur together
in a larger number of cases than expected.
According to Yule and Kendall:
“ In Statistics, A and B are associated only if they appear together
in a greater number of cases than it is expected if they are
Independent.”
TYPES OF ASSOCIATION OF ATTRIBUTES:
1. Positive Association: When two attributes are present or absent
together in the data, they are said to be positively associated.
2. Negative Association: When two attributes have trend of being
affected in different directions.
3. Independence of Attributes: When two attributes do not have
tendency to be present together or presence of one does not
cause absence of the other attribute, two attributes are
regarded as independent
METHODS OF DETERMING ASSOCIATION OF
ATTRIBUTES
I
Comparison of observed and Expected Frequencies
method
II
Method of Comparison of Proportions
III
Yule’s Coefficient of Association Method.
Preparation of nine square table
A
/a
B
AB
B
B
/b
A


A

N
Here A and B are two parameters
a or  = those parameters which are not A
b or  = those parameters which are not B
Comparison Of Observed and Expected Frequencies Meth
•
If AB = A X B / N ( INDEPENDENT ASSOCIATION)
•
If AB > A X B / N ( POSITIVE ASSOCIATION)
•
If AB < A X B / N ( NEGATIVE ASSOCIATION)
Example:
Explain from the data given below, whether A and B are
Independent, Positively or Negatively Associated?
(i) N = 1000, A = 470, B = 620, AB = 320
(ii) A = 490, AB = 294, a = 570, ab = 380
(iii) AB = 256, aB = 768, Ab = 48, ab = 144
YULE’S COEFFICIENT OF ASSOCIATION:
Q AB = AB X ab – Ab X aB
AB X ab + Ab X aB
INTERPRETATION OF THE COEFFICIENT:
• If the value of Q AB is zero, there is no association betwe
two attributes or the attributes are independent.
• The mathematical sign ( + or -) with coefficient determi
nature or direction of association.
DEGREE OF ASSOCIATION
Degree
1. Perfect
Positive
Negative
+1
-1
2. Limited
Very High
 High
 Moderate
 Low

3. Absence
Between + .90 & +.99
Between + .75 & +.90
Between + .25 & +.75
Between + 0 & +.25
0
Between -.90 & -.99
Between -.75 & -.90
Between -.25 & -.75
Between 0 & -.25
0
Example 1:
Prepare a nine square table from the following informatio
calculate the Yule’s coefficient of Association.
i) N = 1000, A = 400, B = 500, AB = 150
ii) A = 470, AB = 290, a = 530, aB = 310
Example 2:
From the data given below, find out the association betwe
darkness of eye colour of father and son.
•
•
•
•
Father with dark eyes and sons with dark eyes = 50
Father with dark eyes and sons with light eyes = 79
Father with light eyes and sons with dark eyes = 89
Father with light eyes and son with light eyes = 782
What is a Hypothesis?

A hypothesis is an
assumption about
the population
parameter.
–
–
A parameter is a
characteristic of the
population, like its
mean or variance.
The parameter must
be identified before
analysis.
I assume the mean age
of this class is 23yrs!
The Null Hypothesis, H0


States the Assumption (numerical) to be
tested
Begin with the assumption that the
null
hypothesis is TRUE.
(Similar to the notion of innocent until proven
guilty)
•The Null Hypothesis may or may not be rejected.
The Alternative Hypothesis, H1



Is the opposite of the null
hypothesis
The Alternative Hypothesis may or
may
not be accepted
Is generally the hypothesis that is
believed to be true by the
researcher
Hypothesis Testing Process
Assume the
population
mean age is 50.
(Null Hypothesis)
Is X  20    50?
Population
The Sample
Mean Is 20
No, not likely!
REJECT
Null Hypothesis
Sample
HYPOTHESIS TESTING
It begins with an assumption called hypothesis
For e.g. If a coin is tossed 100 times, 59 Heads and 41 Tails
were obtained
The hypothesis may be that the coin is unbiased.
PROCEDURE OF TESTING HYPOTHESIS:
1.
2.
3.
4.
5.
Set up a hypothesis
Set up a suitable significance level
Setting a test criteria
Doing Computations
Making Decisions
TYPES OF ERRORS:
Accept Ho
Ho is True
Correct Decision
Ho is False
Type II Error
Reject Ho
Type I Error
Correct Decision
CHI SQUARE TEST
Large Sample Test
Sample Size > 30
Formula
 2 = ∑ (O – E)2
E
2 = The value of chi square
O = The observed value
E = The expected value
∑ (O – E)2 = all the values of (O – E) squared then added
together
Degree of Freedom: (r-1)(c-1) or (n-1)
Example 1:
In an anti malarial campaign, in a certain district. Quinine was
administered to 812 persons out of a total population of 3248.
The number of fever cases is shown below:
Treatme
nt
Fever
No fever
Total
Quinine
20
220
792
2216
812
2436
240
3008
3248
No
Quinine
Total
Discuss the usefulness of quinine in checking malaria.
Expected frequency (E) =
Row Total x Column Total
N
O
20
220
792
2216
E
60
180
752
2256
O-E
40
40
40
40
(O-E)2 (O-E)2
E
1600 26.67
1600 8.89
1600 2.13
1600 0.71
=
38.4
Degree of Freedom = (r-1) (c-1)
Here r = 2 ( no. of rows)
And c = 2 (no. of columns)
Hence d.o.f = (2-1) x (2-1)
= 1
At d.o.f = 1,
The table value of  2 3.841 at 5% level
i.e.. 38.4 > 3.841 Hence the hypothesis is rejected.
Chi-Square Table
3.841
Example 2:
200 digits are chosen at random from a set of tables.
The frequencies of the digits are as follows:
Digit
Frequency
0
18
1
19
2
23
3
21
4
16
5
25
6
22
7
20
8
21
9
15
Use Chi Square Test to assess the correctness of the hypothesis
that the digits were distributed in equal numbers in the tables
from which they were chosen.
Small Sample
Sample size< 30
t- test
William Gosset
ASSUMPTIONS OF T TEST:
• The sample is drawn from a normal population
• Sample of observations are independent
• Sample size is small
• Population variance is unknown.
Test the significance of the mean in a random
sample:
In determining whether the mean of sample
drawn from a normal population deviates
significantly from a stated value (the
hypothetical value of the population mean),
when the variance of population is unknown.
Formulae:
t=
X-
X n
S
S=
  d2
n-1
Where X
= the mean of sample

= the actual or hypothetical mean of the population
n
= sample size
S
= std deviation of the sample.
d
= deviation from mean.
Fiducial limits of the population:
At 95% Significance level
X  ( S )t 0.05
n
At 99% Significance level
X  ( S )t 0.01
n
Example:
In a random sample of 17 towns the mean
population was 57000 and the standard
deviation was 1600. Determine the
confidence limits of the mean of the
population at
(a) 95% level of confidence
(b) 99% level of confidence
Example:
Ten workers of a factory are selected at random. The
number of units produced by them on a working day was
as follows:
71, 72, 73, 75, 76, 77, 78, 79, 79, 80.
On the basis of the given data is it reasonably correct to
say that the mean number of units produced by them is 78
(For v = 9, t 0.05 = 2.262)
II Testing the significance of difference
between two sample means – small sample:
In this it is assumed that the two samples are
independent that is the value of observation in
one sample does not depend on other.
Formulae:
t=
X1 - X2 x n1n2
S
n1 + n2
X1 = mean of the first sample
X2 = mean of second sample
n1 = number of observation in first sample
n2 = number of observation in second sample
S = combined std deviation (dev should be from actual
mean)
S =   (X1 – X1)2  (X2 – X2)2
n1 + n2 - 2
D.f = n1 + n2 -2
Example:
A random sample of 10 pigs was kept on food A the
increase in their weight (kg) is as under-
10,6,16,17,13,12,8,14,15,9.
Another randomly drawn sample of 12 pigs was kept on
food B for the same period. The increase in their weight
(kg) is as under7,13,22,15,12,14,18,8,21,23,10,17.
Examine the significance of the difference between the
increase in weights of pigs kept on food A and that of pigs
kept on food B. (Value of t for degrees of freedom 20 at
5% level of significance is 2.09)
F Distribution:
The F test is named in honor of the great statistician
Mr. R.A Fisher. The object of this test is to find out
whether the two independent estimates of the
population variance differ significantly, or whether the
two samples may be regarded as drawn from the
normal populations having the same variance.
Formulae: F =
Larger estimate of population variance
Smaller estimate of population variance
F
= S12
S22
Where
S12 >S22
S12=
 (X1 – X1)2
n1 – 1
S22=
 (X2 – X2)2
n2 - 1
Example:
A random sample of 10 pigs was kept on food A the
increase in their weight(kg) is as under10,6,16,17,13,12,8,14,15,9.
Another randomly drawn sample of 12 pigs was kept on
food B for the same period. The increase in their weight
(kg) is as under7,13,22,15,12,14,18,8,21,23,10,17.
Show that there is no significant difference in the
estimates of population variance based on these two
samples. ( 5% value of f at v1 = 11, v2 = 9 is 3.112)
Example:
The following table gives the number of units of an
article produced daily by two laborers A and B.
A: 40
30
38
41
38
35
B:
38
41
33
32
39
39
40
34
Can these results be treated as sufficient evidence
that laborer B is more stable. Use F test.
Example:
Determine by applying F test whether it would be quite
logical to assume that the variance in two blocks are
equal.(For v1 = 7, v2 = 5, F.05 = 4.48)
Block I
Block II
No. of plots
8
6
Mean
60
51
Sum square
50
40
Of deviations
Fisher’s Z test:
Prof Fisher has given the method of testing the
significance of the correlation coefficient in small
samples
I Test the significance in a random sample:
Formulae: (i)
Zs = 1.1513 log 10
( 1+r)
( 1-r)
Z = 1.1513 log 10
( 1+)
( 1-)
(ii)
Z=
Zs –Z
z
r = coefficient of correlation
 = Population coefficient of correlation.
z = 1
n-3
II Test the significance of the difference between two
independent correlation coefficients:
Formulae: (i)
Z1 = 1.1513 log 10 ( 1+r)
( 1-r)
Z2 = 1.1513 log 10 ( 1+r)
( 1-r)
(ii)
Z=
Z1 –Z2
z1 - z2
z1 - z2 =
1
+
n-3
Degree of freedom is n1 + n2 - 6
1
n - 3
Example
(i) From a sample of 29 pairs of observations
correlation coefficient of .72 was obtained. Is this
significantly different from the population
correlation coefficient of .8?
(ii) The correlation of coefficient of 19 pairs of
observations is 64 determine by z test whether it
is significantly different from (a) 60 (b) 50?
Example:
The following data gives two sample sizes and correlation
coefficients. Test the significance of the difference between
two values at 5% level of significance using Fisher’s Z
transformation.
Sample
Size
Value of r
I
23
.40
II
19
.65