Chapter 11 Comparing two populations or Treatments

CHAPTER 11
COMPARING TWO POPULATIONS
OR TREATMENTS
How can the data from two independent
populations or treatments be evaluated to
determine causation?
ACTIVATION:
Does a double stuf Oreo really have twice the
“stuffing” of a regular Oreo?
= 2∙
How would you go about proving this. What do
you need?
INFERENCES CONCERNING THE DIFFERENCE
BETWEEN TWO POPULATION OR TREATMENT
MEANS USING INDEPENDENT SAMPLES? 11.1

The comparison is
µ1-µ2
definitions
Independent samples—
the selection of one sample does not
influence the selection of the other
Paired samples—
only one sample is chosen, a treatment is
given, then the test is redone on the same
sample
PROPERTIES OF THE SAMPLING DISTRIBUTION
FOR
x1 - x 2
1.  x - x   x   x  1  2 making x1 - x 2 unbiased
since it is centered at µ1 - µ2
2
2


1
2
2
2
2
 2. 
therefore

x x   x   x 

1
2
1
2
1
2
1
2
n1


n2
x1  x2

 21  2 2
n1

n2
3. If n1 and n2 are large or the population is
normally distributed then x1 and x 2 are approx.
normal and the sampling distribution of x1 - x 2 is
also approximately normal
THE HYPOTHESIS TEST
z
( x1  x2 )  ( 1   2 )
 21  2 2
n1

n2
When we have s
Used if n1 and n2 are sufficiently
large (>30) or approx normally
distributed and we have

(v1  v2 ) 2
with df 
2
2
v1
v2

n1  1 n2  1
Where
t
( x1  x2 )  ( 1   2 )
s 21 s 2 2

n1 n2
2
2
s
s
v1  1 and v 2  2
n1
n2
Round down

H0 will be µ1 - µ2 = hypothesized value


This is often zero indicating no difference
Ha can be
> area to the right of t (given value)
 < 1- area to the right
 ≠ 2 times the area to the right


Both samples must be
ISRSs
 Large or approximately normally distributed

EXAMPLE 11.1
TENNIS ELBOW IS THOUGHT TO BE AGGRAVATED BY THE IMPACT EXPERIENCED WHEN HITTING THE BALL. THE
ARTICLE “FORCES ON THE HAND IN THE TENNIS ONE-HANDED BACKHAND” (REF IN BOOK) REPORTED THE FORCE
N ON THE HAND JUST AFTER IMPACT ON A ONE-HANDED BACKHAND FOR SIX ADVANCED PLAYERS AND FOR EIGHT
INTERMEDIATE PLAYERS. USE THE DATA IS LISTED BELOW TO DETERMINE IF THE FORCE AFTER IMPACT IS
GREATER FOR ADVANCED PLAYERS THAN FOR INTERMEDIATE PLAYERS.
ADVANCED: 44.7
INTERMEDIATE: 15.58
26.31
19.16
55.75
24.13
28.54
10.56
46.99
32.88
39.46
21.47
14.32

26.31, 28.54, 39.46, 44.7, 46.99, 55.75

10.56 , 14.32, 15.58, 19.16, 21.47, 24.13, 32.88, 33.09

Both boxes are approx. normally distributed

Use this technique whenever it is not stated and there is not a sufficiently large amount of data
33.09
EXAMPLE 11.1
TENNIS ELBOW IS THOUGHT TO BE AGGRAVATED BY THE IMPACT EXPERIENCED WHEN HITTING THE BALL. THE
ARTICLE “FORCES ON THE HAND IN THE TENNIS ONE-HANDED BACKHAND” (REF IN BOOK) REPORTED THE FORCE
N ON THE HAND JUST AFTER IMPACT ON A ONE-HANDED BACKHAND FOR SIX ADVANCED PLAYERS AND FOR EIGHT
INTERMEDIATE PLAYERS. USE THE DATA IS LISTED BELOW TO DETERMINE IF THE FORCE AFTER IMPACT IS
GREATER FOR ADVANCED PLAYERS THAN FOR INTERMEDIATE PLAYERS.
1
2
ADVANCED: 44.7
INTERMEDIATE: 15.58
26.31
19.16
55.75
24.13
28.54
10.56
46.99
32.88
39.46
21.47
14.32

μd= the force on the hand for advanced – intermediate tennis players

Ho: μd = 0

Ha: μd > 0

α = .05

Since we have an ISRSs, and both groups are shown to be approx normal by
boxplots sadv= 40.79 and sadv = 11.588 and n = 6

x int
x

t

33.09
= 21.398 and sint = 8.301 and n = 8
(40.79  21.398)  0
11.5882 8.3012

6
8
t= 3.48 with df= 8.67 and a pvalue of .003
Since pvalue<α reject H0 because
results such as these occur .3% of the
time by chance, meaning that we believe
there is support for Ha that the force on
the hand for advanced tennis players is
greater than the force for intermediate
tennis players.

o
o
Since df is a complicated calculation, the
conservative method is to take the smaller of
n1-1 or n2 – 1 as the df or use the calculator
This works because if H0 is rejected based on a
conservative value it will definitely be rejected
based on the calculated value.
The conservative method is acceptable on the AP
test
TWO SAMPLE T-TEST REQUIRES
1. The treatment are randomly assigned
 2. the sample sizes are large or the treatment
distributions are normal

The POOLED T-TEST
2
2
If it is known that  1   2 then all the data may be
“grouped” to calculate  2 then replaces s21 & s22
2
2
Used because when  1   2 it has a greater chance
of detecting a departure from H0 this is not used as
often recently since if  21   2 2 it can cause a larger
error
OBSERVATIONS
 If
an assignment is not made by the investigator,
then it does not have as much statistical
significance since underlying factors may have an
impact.
 Remember
observation does not imply causation,
but replication over time can provide strong
evidence of a causal relationship.
CONFIDENCE INTERVALS FOR 2 SAMPLES

Read the critical t for the specified confidence
level from table III
s 21 s 2 2
( x1  x2 )  (crit t )

n1 n2

All items are similar to previous chapters—there
are simply two items that need to be subtracted.
MINITAB OUTPUT:
Two sample T and Adv vs Int
N
Mean
StDev
SEMean
Adv
6
40.3
11.3
4.6
Int
8
21.40
8.30
2.9
T-Test mu Adv=mu (VS>): T = 3.46
P= 0.0043 DF = 8
NOW Calculate a 95% confidence interval for the difference between the means
11.32 8.32
(40.3  21.4)  (2.31)

6
8
always list as:
18.9 ±12.63
(lower limit, upper limit)
( 6.27, 31.53)
TOMORROWS LESSON:
Does a double stuf Oreo really have twice the
“stuffing” of a regular Oreo?
= 2∙
Hw. Pg 575-579 4, 6, 7, 10, 12, 15, 18
INFERENCES CONCERNING THE DIFFERENCE BETWEEN
TWO POPULATION MEANS USING PAIRED SAMPLES 11.2
What can be said about two populations using the
concept of paired sampling?

Paired samples occur when

you use the same person before and after a treatment

the same item before and after a treatment

The top of a soil sample and an area lower in the soil
but in the same spot (ie 10 inches deeper)

or carefully chosen groups to match what might
otherwise be extraneous factors often provide more
information than independent samples

It has been hypothesized that strenuous physical activity affects hormone levels. The article “Growth hormone increase During sleep
after daytime exercise” from an experiment on 6 healthy males. For each participant, blood samples were taken during sleep on two
different nights. The first blood sample (the control) was drawn after a day that included no strenuous activity. The second was drawn
after a day when the subject had engaged in strenuous activity. The resulting data on growth hormone level in mg/ml follows. The
samples are paired rather than independent, because both samples are composed of measurements on the same men.
Subject

1
2
3
4
5
6
Post-exercise
13.6
14.7
42.8
20.0
19.2
17.3
Control
8.5
12.6
21.6
19.4
14.7
13.6





c



1
xx
cc c
x
12
6
2 65
xx
c c
54
x
3
43
The top scatter gram gives a different perspective than
the lower one
POST-EXERCISE
BEFORE EXERCISE
1
13.6
8.5
SUBJECT
2
3
14.7
42.8
12.6
21.6
4
20.0
19.4
5
19.2
14.7
6
17.3
13.6
µd=mean value of the difference for the paired items of
the population before and after treatment
  d = the standard deviation of the difference of the pairs

The sample must be paired
 Viewed as a random sample of differences
 N is large (n ≥ 30) or approx. normal
 Df = n-1 and
xd  

t
d
sd
n
POST-EXERCISE
BEFORE EXERCISE
METHOD
1
13.6
8.5
SUBJECT
2
3
14.7
42.8
12.6
21.6
4
20.0
19.4
5
19.2
14.7
6
17.3
13.6
µd = after - before
 H0: µd = 0
 H a: µ d > 0
  = .05
 Enter after in L1, before in L2
 Arrow to the top of L3 set to L1-L2
 Do one variable statistics on L3
 Take x and Sx (sample st dev.) to use in a ONE SAMPLE
t-test
 Calculate t, get the critical t
(use the appropriate one based on Ha )
 Compare this P-value with alpha to reject or not

CONFIDENCE INTERVAL FOR PAIRED SAMPLE
sd
xd  crit t ( )
n
HW Pg 589-593 23, 26, 27, 34, 35, 36
It will be checked!
LARGE SAMPLE INFERENCES CONCERNING A DIFFERENCE
BETWEEN TWO POPULATION PROPORTIONS
11-3

What formula is required for a two-sample
population proportion and how is this different
from independent proportions?

STANDARD DEVIATION OF A TWO SAMPLE PROPORTION
FOR A Z-TEST
Uses a combined proportion:
It is similar to using a weighted average



Use
n1 p1  n2 p2 when you are only
pc 
n1  n2 given the proportions
ie p=.123
OR
Use
x1  x2
pc 
n1  n2
if they gave the number
of success out of each of the
samples

Make sure to define in words the direction of the
difference sample 1 – sample 2 or vice versa
pd  p1  p2
State SRS
 Both n1p1 and n1 (1-p1) and

n2p2 and n2 (1-p2) >5
Since it is a proportion the test will again be a
z-test

z
( pˆ 1  pˆ 2 )  ( p1  p2 )
pc (1  pc ) pc (1  pc )

n1
n2
Where
n1 p1  n2 p2
pc 
n1  n2

p1
p1 (1  p1 ) p2 (1  p2 )

n1
n2
Skip 68
p2