Does lack of caffeine increase depression?

Chapter 7
Inference for Distributions
Inference for the mean of a population



So far, we have assumed that s was known.
If s is unknown, we can use the sample standard deviation (s),
to estimate s.
1
2
s
(
x

x
)
 i
n 1
But this adds more variability to our test statistic and/or confidence
interval (therefore, we will use the t-table).

If s is known, then s /n is known as standard deviation of x.

If s is not known, then s/n is known as standard error of x.

When s is not known, we use the t-table (Table D) instead of the
Normal Table A (or say Z-table).
The t-distribution
When n is very large, s is a very good estimate of s and the
corresponding t distributions are very close to the normal distribution.
The t distributions become wider for smaller sample sizes, reflecting the
lack of precision in estimating s from s.
Need degree of freedom (say: df).
In the “one sample problem with
sample size n”, df = n-1.
As df increases, the t-distribution
gets closer to the standard normal.
Table for t-distribution is Table D
or use tcdf(start, end, df)
How to find the p-value for t-distribution with TI83?



pressing [2nd] [VARS]. tcdf(start, end, df), where df=degree-freedom
Select [6:tcdf]
Left-tailed test (H1: μ < some number)
1.Let our test statistics be –2.05 and n =16, so df = 15.
2. The p-value would be the area to the left of –2.05 or P(t < -2.05)
3. Notice the p-value is .0291, we would type in tcdf(-E99, -2.05,15) to get the same pvalue.

Right-tailed test (H1: μ > some number)
1.Let our test statistics be 2.05 and n =16, so df = 15.
2. The p-value would be the area to the right of 2.05 or P(t >2.05)
3. Notice the p-value is .0291, we would type in tcdf(2.05, E99, 15) to get the same pvalue.

Two-tailed test (H1: μ ≠ some number)
1.Let our test statistics be –2.05 and n =16, so df = 15.
2. The p-value would be double the area to the left of –2.05 or 2*P(t < -2.05)
3. Notice the p-value is .0582, we would type in 2*tcdf(-E99, -2.05,15) to get the same
p-value.
The one-sample t-test (5 steps)
1. Stating H0 versus Ha.
2. Choosing a significance level a
X 
3. Calculating t 
S/ n
and df = n-1. (ASSUMING THE
NULL HYPOTHESIS IS TRUE)
4. Finding the P-value in direction of Ha: use
tcdf(start, end, df) for one-sided test or,
2*tcdf(start, end, df) for two-sided test.
5. Drawing conclusions:
If P-value ≤ α, then we reject H0 (Enough evidence…).
If P-value > α, then we do not reject H0 (No Enough evidence...).
The P-value is the probability, if H0 is true, of randomly drawing a
sample like the one obtained or more extreme, in the direction of Ha.
The P-value is calculated as the corresponding area under the curve,
one-tailed or two-tailed depending on Ha:
One-sided
(one-tailed)
Two-sided
(two-tailed)
x  0
t
s n
One-sample Test (Shall we use Z-test or T-test??)
(Chap6)-- The National Center for Health Statistics reports that the
mean systolic blood pressure for males 35 to 44 years of age is 128
with a population SD=15. The medical director of a company looks at
the medical records of 72 company executives in this age group and
finds that the mean systolic blood pressure in this sample is 126.07. Is
this evidence that executives blood pressures are lower than the
national average?
(Chap7)-- The National Center for Health Statistics reports that the
mean systolic blood pressure for males 35 to 44 years of age is 128. A
simple random sample of 16 patients were tested, with average systolic
blood pressure 120 and sample SD 12. Is this evidence that executives
blood pressures are lower than the national average?
Answer to Example in Chap7:
(1) Hypothesis: H0 : µ = 128 v.s. Ha : µ <128. (2) α = 5%
(3) One-sample t-Test statistics
t
x   120  128

 2.67, df  n  1  16  1  15
s
12
n
16
(4) Draw the t(15) curve. Thus, P-value
t  = tcdf(-999, -2.67, 15) = 0.0087
(5) (Statistical Conclusion) Since P-value < α, we reject H0.
(Non- Statistical Conclusion) That is, there is STRONG evidence that
executives blood pressures are lower than the national average.
One-sample Test (Shall we use Z-test or T-test??)
(Chap6)-- A new medicine treating cancer was introduced to the market
decades ago and the company claimed that on average it will prolong
a patient’s life for 5.2 years. Suppose the SD of all cancer patients is
2.52. In a 10 years study with 64 patients, the average prolonged
lifetime is 4.6 years. With normality assumption, do the 10-year study’s
data show a different average prolonged lifetime?
(Chap7)-- A new medicine treating cancer was introduced to the market
decades ago and the company claimed that on average it will prolong
a patient’s life for 5.2 years. In a 10 years study with 20 patients, the
average prolonged lifetime is 4.7 years with sample SD 2.50. With
normality assumption, do the 10-year study’s data show a different
average prolonged lifetime?
Answer to Example in Chap7:
(1) Hypothesis: H0 : µ = 5.2 year versus Ha : µ ≠ 5.2 year.
(2) α = 5%
(3) One-sample t-Test statistics
x   4.7  5.2
t

 0.894, df  n  1  19
s
2.5
n
20
(4) Draw the t(19) curve. Thus, P-value = 2*tcdf(-999, -0.894) = 0.383.
(Statistical Conclusion) Since P-value > α, we do not reject H0.
(Non-Statistical Conclusion) There is NOT enough evidence to conclude that
10-year study’s data show a different average prolonged lifetime.
Example 3: Hypothesis testing
For the following data set:
5
13
x
8
9
7
6
10
14
12
11
17
10
12
= 10.308, s = 3.376
Q: Use Hypothesis Testing to test that the mean is
significantly higher than 9.5.
Exercises on Hypothesis Testing (t-test)
1. Because of variation in the manufacturing process, tennis balls produced
by a particular machine do not have identical diameters, which is
supposed to be 3in. If the average diameters of the first 36 balls made
from a machine is 3.2in with sample SD 0.15in, shall we stop and
calibrate the machine?
2. A new medicine treating cancer was introduced to the market decades
ago and the company claimed that on average it will prolong a patient’s
life for 5 years. In a 10 years study with 81 patients, the average
prolonged lifetime is 4.5 years with sample SD 0.4 years. With normality
assumption, shall we reject the original claim?
3. The registrar office claims that the average SAT score of UNCW
students is 1050. Suppose you randomly select 100 UNCW students the
SAT score average of your sample is 1042 with sample SD 80. Do you
agree with the claim?
4. National data shows that on the average, college freshmen spend 7.5
hours a week going to parties. One administrator takes a random
sample of 81 freshmen from her college and finds out that her students’
average hours spent on parties is 7.6 with SD 2 hours. Shall the
administrator believe that the national data applies to her students?
Answer:
1.
H0 : µ = 3, Ha : µ ≠ 3; α = 5%;T=(3.2-3)/(.15/(36)^.5)=8; df=35; t-critical
value= 2.04; P-value <5%;we reject H0 and we shall stop and calibrate the
machine.
2.
H0 : µ = 5, Ha : µ ≠ 5; α = 5%;T=(4.5-5)/(.4/(81)^.5)=-11.25; df=80; tcritical value= 1.99; P-value <5%;we reject H0 and we shall reject the claim
that the average is 5 years.
3.
H0 : µ = 1050,Ha :µ ≠ 1050; α = 5%;T=(1045-1050)/(80/(100)^.5)=0.625; df=99; t-critical value= 1.984; P-value >5%;we do not H0 and we do not
need to stop and calibrate the machine.
4.
H0 : µ = 7.5, Ha : µ ≠ 7.5; α = 5%;Z=(7.6-7.5)/(2 /(81)^.5)=0.45; df=80; tcritical value= 1.99; P-value>5%;we do not reject H0 and the national data
does apply.
Matched pairs t procedures
for dependent sample
Subjects
are matched in “pairs” and
outcomes are compared within each unit

Example: Pre-test and post-test studies look at data
collected on the same sample elements before and after
some experiment is performed.

Example: Twin studies often try to sort out the influence of
genetic factors by comparing a variable between sets of
twins.
We
perform hypothesis testing on the difference in each unit
Matched pairs
The variable studied becomes Xdifference = (X1 − X2). The null
hypothesis of NO difference between the two paired
groups.
H0: µdifference= 0 ; Ha: µdifference>0 (or <0, or ≠0)
When stating the alternative, be careful how you are
calculating the difference (after – before or before – after).
Conceptually, this is not different from tests on one
population.
Matched Pairs



If we take After – Before, and we want to show that the
“After group” has increased over the “Before group”
Ha:  > 0
“After group” has decreased
Ha:  < 0
xdiff   diff
t diff 
The two groups are different
sdiff n
Ha:  ≠0
Example 4
Many people believe that the moon influences the actions of some
individuals. A study of dementia patients in nursing homes recorded
various types of disruptive behaviors every day for 12 weeks. Days
were classified as moon days and other days. For each patient the
average number of disruptive behaviors was computed for moon
days and for other days. The data for 5 subjects whose behavior
were classified as aggressive are presented as below:
Moon days
Other days
3.33
0.27
3.67
0.59
2.67
0.32
3.33
0.19
3.33
1.26
We want to test whether there is any difference in aggressive behavior
on moon days and other days.
Example 4
Many people believe that the moon influences the actions of some
individuals. A study of dementia patients in nursing homes recorded
various types of disruptive behaviors every day for 12 weeks. Days
were classified as moon days and other days. For each patient the
average number of disruptive behaviors was computed for moon
days and for other days. The data for 5 subjects whose behavior
were classified as aggressive are presented as below:
Moon days
3.33
3.67
2.67
3.33
3.33
Other days
0.27
0.59
0.32
0.19
1.26
Difference
3.06
3.08
2.35
3.14
2.07
We want to test whether there is any difference in aggressive behavior
on moon days and other days.
Answer to Example 4
Let difference = aggressive behavior on moon days and other days.





H 0 : d  0
verses
H a : d  0 ,
a  0.05
t-statistic=12.377, df=5-1=4,
p-value=2.449*10^(-4).
Reject H0 at 5% level.
Enough evidence to conclude that there is any difference
in aggressive behavior on moon days and other days
Does lack of caffeine increase depression? (matched pair t-test)
Individuals diagnosed as caffeine-dependent are deprived of caffeine-rich
foods and assigned to receive daily pills. Sometimes, the pills contain
caffeine and other times they contain a placebo. Depression was
assessed.

Q: Does lack of caffeine increase depression?
There are 2 data points
for each subject, but
we’ll only look at the
difference. The sample
distribution appears
appropriate for a t-test.
Depression Depression
Subject with Caffeine with Placebo
1
5
16
2
5
23
3
4
5
4
3
7
5
8
14
6
5
24
7
0
6
8
0
3
9
2
15
10
11
12
11
1
0
Does lack of caffeine increase depression?
For each individual in the sample, we have calculated a difference in depression
score (placebo minus caffeine).
There were 11 “difference” points, thus df = n − 1 = 10.
We calculate that x = 7.36; s = 6.92
H0: difference = 0 ; H0: difference > 0

x 0
7.36
t

 3.53
s n 6.92 / 11
Depression Depression Placebo Subject with Caffeine with Placebo Cafeine
1
5
16
11
2
5
23
18
3
4
5
1
4
3
7
4
5
8
14
6
6
5
24
19
7
0
6
6
8
0
3
3
9
2
15
13
10
11
12
1
11
1
0
-1
For df = 10, p-value=0.0027.
(1)Since p-value < 0.05, reject H0.
(2) We have enough evidence to conclude that:
Caffeine deprivation causes a significant increase in depression.
Comparing two independent samples
Population 1
Population 2
Sample 2
Sample 1
Independent samples: Subjects in one sample are
completely unrelated to subjects in the other sample.
We often compare two
treatments used on
independent samples.
Is the difference between both
treatments due to a true
difference in population means?
Sec 7.2 Two independent samples t distribution
We have two independent SRSs (simple random samples) possibly
coming from two distinct populations with (1,s1) and (2,s2) unknown.
We use ( x1,s1) and ( x2,s2) to estimate (1,s1) and (2,s2), respectively.
To compare the means, both populations should be normally
distributed.However, in practice, it is enough that the two distributions
have similar shapes and that the sample data contain no strong outliers.
The two-sample t statistic follows approximately the t distribution with a
standard error SE (spread) reflecting
SE 
variation from both samples:
s12 s22

n1 n 2
Conservatively, the degrees
of freedom is equal to the
smallest of (n1 − 1, n2 − 1).

df
s12 s22

n1 n 2

 1 - 2
x1  x2
Two-sample t-test
The null hypothesis is that both population means 1
and 2 are equal, thus their difference is equal to zero.
H0: 1 = 2 <> 1 − 2  0
with either a one-sided or a two-sided alternative hypothesis.
We find how many standard errors (SE) away
x x
from (1 − 2) is ( 1− 2) by standardizing:
Because in a two-sample test H0
 
assumes (1 − 2)  0, we simply use
With df = smallest(n1 − 1, n2 − 1)

(x1  x 2 )  (1  2 )
t
SE
t
x1  x 2
2
1
2
2
s
s

n1 n 2
Does smoking damage the lungs of children exposed
to parental smoking?
Forced vital capacity (FVC) is the volume (in milliliters) of
air that an individual can exhale in 6 seconds.
FVC was obtained for a sample of children not exposed to
parental smoking and a group of children exposed to
parental smoking.
Parental smoking
FVC
Yes
No
x
s
n
75.5
9.3
30
88.2
15.1
30

We want to know whether parental smoking decreases
children’s lung capacity as measured by the FVC test.
Is the mean FVC lower in the population of children
exposed to parental smoking?
H0: smoke = no <=> (smoke − no) = 0
Ha: smoke < no <=> (smoke − no) < 0 (one sided)
The difference in sample averages
follows approximately the t distribution
with 29 df:
We calculate the t statistic:
Parental smoking
t
t
xsmoke  xno
2
2
ssmoke
sno

nsmoke nno

75.5  88.2
9.32 15.12

30
30
 12.7
  3.9
2.9  7.6
FVC x
s
n
Yes
75.5
9.3
30
No
88.2
15.1
30

p-value=tcdf(-E99, -3.919, 29)=2.491*10^(-4),
So p-value < 5%. It’s a very significant
difference, we reject H0.
Therefore, we have enough evidence to conclude that lung capacity is
significantly impaired in children of smoking parents.
Two-sample t-test
Example 1.
A clinical dietician wants to compare two different diets, A and B,
for diabetic patients. She gets a random sample of 60 diabetic
patients and randomly assign them into two equal sized groups.
At the end of the experiment, a blood glucose test is conducted
on each patient.
The average difference in blood glucose measure from group A
is 100 mg/dl with sample SD 10, and the average difference in
blood glucose measure from group B is 106 mg/dl with sample
SD 12.
Q: Does this indicate that diet B has higher blood glucose than
diet A?
Two-sample t-test
1.
H0 : µA = µB, Ha : µA < µB; α = 5%;
T=(100-106)/(10^2/30+12^2/30)^.5=-2.104; df=29;
p-value=tcdf(-E99, -2.104, 29)=0.022; P-value <5%;
Therefore we reject H0, and we have enough evidence to
conclude that diet B has higher blood glucose than A.
Two-sample t-test
Example 2.
An experiment is conducted to determine whether intensive
tutoring (covering a great deal of material in a fixed amount of
time) is more effective than paced tutoring (covering less
material in the same amount of time). Two randomly chosen
groups are tutored separately and then administered proficiency
tests.
The sample size of the intensive group is 10 with sample
average 76 and sample SD 6; The sample size of the paced
group is 12 with sample average 70 and sample SD 8.
Q: May we conclude that the intensive group is doing better?
Two-sample t-test
2.
H0 : µint = µpac, Ha : µint > µpac; α = 5%;
T=(76-70)/(6^2/10+8^2/12)^.5=2.007; df=min(9, 11)=9;
p-value=tcdf(2.007, E99, 9)=0.038; P-value <5%;
Therefore we reject H0 and we have enough evidence to
conclude that intensive group is better.
Two sample t-confidence interval

The general form of the confidence interval for
the population difference 12:
s12 s22
( x1  x2 )  t *

n1 n2

We find t* from Table D with
df = smallest (n1−1; n2−1).
EX 7.14: Can directed reading activities in the classroom help
improve reading ability? A class of 21 third-graders participates
in these activities for 8 weeks while a control classroom of 23
third-graders follows the same curriculum without the activities.
After 8 weeks, all children take a reading test (scores in table).
Q: Find the 95% confidence interval for (µ1 − µ2).
EX 7.14: Can directed reading activities in the classroom help
improve reading ability? A class of 21 third-graders participates
in these activities for 8 weeks while a control classroom of 23
third-graders follows the same curriculum without the activities.
After 8 weeks, all children take a reading test (scores in table).
95% confidence interval for (µ1 − µ2), with df = 20 conservatively  t* = 2.086:
With 95% confidence, (µ1 − µ2), falls within 9.96 ± 8.987 or (0.973, 18.947).
Two sample t-confidence interval
1. The average lifetime of 36 randomly selected TVs from
brand A is 20 years with sample SD 2 years. The average
lifetime of 25 randomly selected TVs from brand B is 18 years
with sample SD 4 years. Construct a 95% CI for the difference
of the average lifetimes between brand A and brand B.
2. In a clinical study, a new medicine is used in the treatment
group with 64 patients. The new medicine can on average
prolong 4 years of life with sample SD 0.75. As a comparison,
the placebo group with 60 patients has an average prolonged
life of 3 years with sample SD 1.2 years. Construct a 90% CI
for the difference of the average lifetimes prolonged between
the treatment group and the placebo group.