9/10/13 lecture on hypothesis testing

Tuesday, September 10, 2013
Introduction to hypothesis testing
Last time:
Probability & the Distribution of Sample Means
• We can use the Central Limit Theorem to
calculate z-scores associated with individual
sample means (the z-scores are based on the
distribution of all possible sample means).
• Each z-score describes the exact location of its
respective sample mean, relative to the
distribution of sample means.
• Since the distribution of sample means is
normal, we can then use the unit normal table to
determine the likelihood of obtaining a sample
mean greater/less than a specific sample mean.
Probability & the Distribution of Sample Means
When using z scores to represent sample means, the
correct formula to use is:
ZM =
M -m
sM
Probability & the Distribution of Sample Means
EXAMPLE: What is the probability of obtaining a sample
mean greater than M = 60 for a random sample of n =
16 scores selected from a normal population with a
mean of μ = 65 and a standard deviation of σ = 20?
M = 60; μ = 65; σ = 20; n = 16
s
20 20
sM =
=
=
=5
n
16 4
ZM 
M 
M
60  65

 1
5
p(ZM > -1) = .8413
Last topic before the exam:
• Hypothesis testing (pulls together
everything we’ve learned so far and applies
it to testing hypotheses about about
sample means).
• Before we move on, questions about CLT,
distributions of samples, standard error of
the mean and how to calculate it?
Hypothesis testing
• Example: Testing the effectiveness of a new memory
treatment for patients with memory problems
– Our pharmaceutical company develops a new drug
treatment that is designed to help patients with
impaired memories.
– Before we market the drug we want to see if it works.
– The drug is designed to work on all memory patients,
but we can’t test them all (the population).
– So we decide to use a sample and conduct the following
experiment.
– Based on the results from the sample we will make
conclusions about the population.
Hypothesis testing
• Example: Testing the effectiveness of a new memory
treatment for patients with memory problems
Memory
patients
Memory
treatment
No Memory
treatment
Memory 55
Test
errors
Memory 60
errors
Test
5 error
diff
• Is the 5 error difference:
– A “real” difference due to the effect of the treatment
– Or is it just sampling error?
Testing Hypotheses
• Hypothesis testing
– Procedure for deciding whether the outcome of a study
(results for a sample) support a particular theory (which
is thought to apply to a population)
– Core logic of hypothesis testing
• Considers the probability that the result of a study could have
come about by chance if the experimental procedure had no
effect
• If this probability is low, scenario of no effect is rejected and
the theory behind the experimental procedure is supported
Hypothesis testing
Distribution of possible outcomes
(of a particular sample size, n)
Can make predictions about likelihood of
outcomes based on this distribution.
• In hypothesis testing, we
compare our observed samples
with the distribution of possible
samples (transformed into
standardized distributions)
• This distribution of possible
samples is often Normally
Distributed (This follows from the
Central Limit Theorem).
Inferential statistics
• Hypothesis testing
– Core logic of hypothesis testing
• Considers the probability that the result of a study could have
come about if the experimental procedure had no effect
• If this probability is low, scenario of no effect is rejected and the
theory behind the experimental procedure is supported
– A four step program
•
•
•
•
Step 1: State your hypotheses
Step 2: Set your decision criteria
Step 3: Collect your data & compute your test statistics
Step 4: Make a decision about your null hypothesis
Hypothesis testing
• Hypothesis testing: a four step program
– Step 1: State your hypotheses: as a research hypothesis and a null
hypothesis about the populations
• Null hypothesis (H0)
This is the one that you test
• There are no differences between conditions (no effect of treatment)
• Research hypothesis (HA)
• Generally, not all groups are equal
– You aren’t out to prove the alternative hypothesis
• If you reject the null hypothesis, then you’re left with
support for the alternative(s) (NOT proof!)
Testing Hypotheses
• Hypothesis testing: a four step program
– Step 1: State your hypotheses
In our memory example experiment:
One -tailed
– Our theory is that the
treatment should improve
memory (fewer errors).
H0: μTreatment > μNo Treatment
HA: μTreatment < μNo Treatment
Testing Hypotheses
• Hypothesis testing: a four step program
– Step 1: State your hypotheses
In our memory example experiment:
direction
One -tailed
specified
– Our theory is that the
treatment should improve
memory (fewer errors).
no direction
specified
Two -tailed
– Our theory is that the
treatment has an effect
on memory.
H0: μTreatment > μNo Treatment
H0: μTreatment = μNo Treatment
HA: μTreatment < μNo Treatment
HA: μTreatment ≠ μNo Treatment
One-Tailed and Two-Tailed Hypothesis Tests
• Directional
hypotheses
– One-tailed test
• Nondirectional
hypotheses
– Two-tailed test
Testing Hypotheses
• Hypothesis testing: a four step program
– Step 1: State your hypotheses
– Step 2: Set your decision criteria
• Your alpha (α) level will be your guide for when to reject or fail
to reject the null hypothesis.
– Based on the probability of making a certain type of error
Testing Hypotheses
• Hypothesis testing: a four step program
– Step 1: State your hypotheses
– Step 2: Set your decision criteria
– Step 3: Collect your data & Compute sample statistics
Testing Hypotheses
• Hypothesis testing: a four step program
– Step 1: State your hypotheses
– Step 2: Set your decision criteria
– Step 3: Collect your data & Compute sample statistics
• Descriptive statistics (means, standard deviations, etc.)
• Inferential statistics (z-test, t-tests, ANOVAs, etc.)
Testing Hypotheses
• Hypothesis testing: a four step program
–
–
–
–
Step 1: State your hypotheses
Step 2: Set your decision criteria
Step 3: Collect your data & compute sample statistics
Step 4: Make a decision about your null hypothesis
• Based on the outcomes of the statistical tests researchers will
either:
– Reject the null hypothesis
– Fail to reject the null hypothesis
• This could be the correct conclusion or the incorrect conclusion
Error types
• Type I error (α): concluding that there is a
difference between groups (“an effect”) when
there really isn’t.
– Sometimes called “significance level” or “alpha level”
– We try to minimize this (keep it low)
• Type II error (β): concluding that there isn’t an
effect, when there really is.
– Related to the Statistical Power of a test (1-β)
Error types
There really isn’t
an effect
Real world (‘truth’)
H0 is
correct
Reject
H0
Experimenter’s
conclusions
Fail to
Reject H0
H0 is
wrong
There really
is
an effect
Error types
Real world (‘truth’)
I conclude that
there is an effect
H0 is
correct
Reject
H0
Experimenter’s
conclusions
I can’t detect an
effect
Fail to
Reject H0
H0 is
wrong
Error types
Real world (‘truth’)
H0 is
correct
Reject
H0
Experimenter’s
conclusions
Fail to
Reject H0
H0 is
wrong
Type I
error
a
Type II
error
b
Performing your statistical test
• What are we doing when we test the hypotheses?
Real world (‘truth’)
H0: is true (no treatment effect)
H0: is false (is a treatment effect)
One
population
MA
the memory treatment sample are
the same as those in the population
of memory patients.
Two
populations
MA
they aren’t the same as those in
the population of memory
patients
Performing your statistical test
• What are we doing when we test the hypotheses?
– Computing a test statistic: Generic test
Could be difference between a sample and a population, or
between different samples
observed difference
test statistic =
difference expected by chance
Based on standard error or an estimate
of the standard error
“Generic” statistical test
• The generic test statistic distribution (think of this as the
distribution of sample means)
– To reject the H0, you want a computed test statistic that is large
– What’s large enough?
• The alpha level gives us the decision criterion
Distribution of the test statistic
α-level determines where
these boundaries go
“Generic” statistical test
• The generic test statistic distribution (think of this as the
distribution of sample means)
– To reject the H0, you want a computed test statistics that is large
– What’s large enough?
• The alpha level gives us the decision criterion
Distribution of the test statistic
If test statistic is here
Reject H0
If test statistic is here Fail to
reject H0
“Generic” statistical test
• The alpha level gives us the decision criterion
Two -tailed
One -tailed
α = 0.05
Reject H0
Reject H0
0.025
split up
into the
two tails
0.025
Fail to reject H0
Reject H0
Fail to reject H0
Fail to reject H0
“Generic” statistical test
• The alpha level gives us the decision criterion
Two -tailed
One -tailed
α = 0.05
all of it in
one tail
Reject H0
Reject H0
0.05
Fail to reject H0
Reject H0
Fail to reject H0
Fail to reject H0
“Generic” statistical test
• The alpha level gives us the decision criterion
Two -tailed
One -tailed
α = 0.05
Reject H0
all of it in
one tail
Reject H0
0.05
Fail to reject H0
Reject H0
Fail to reject H0
Fail to reject H0
“Generic” statistical test
An example: One sample z-test
Memory example experiment:
• We give a n = 16 memory patients
a memory improvement treatment.
• After the treatment they have an
average score of M = 55 memory errors.
• How do they compare to the general
population of memory patients who have
a distribution of memory errors that is
Normal, μ = 60, σ = 8?
•
Step 1: State the hypotheses
H0: The treatment sample is
the same as (or worse
than) the population of
memory patients.
μTreatment ≥ μpop = 60
HA: The treatment sample
does better than the
population (fewer errors)
μTreatment < μpop = 60
“Generic” statistical test
An example: One sample z-test
μTreatment ≥ μpop = 60
Memory example experiment:
• We give a n = 16 memory patients
a memory improvement treatment.
• After the treatment they have an
average score of M = 55 memory errors.
• How do they compare to the general
population of memory patients who have
a distribution of memory errors that is
Normal, μ = 60, σ = 8?
μTreatment < μpop = 60
•
Step 2: Set your decision
criteria
One -tailed
α = 0.05
“Generic” statistical test
An example: One sample z-test
μTreatment ≥ μpop = 60
Memory example experiment:
• We give a n = 16 memory patients
a memory improvement treatment.
• After the treatment they have an
average score of M = 55 memory errors.
• How do they compare to the general
population of memory patients who have
a distribution of memory errors that is
Normal, μ = 60, σ = 8?
μTreatment < μpop = 60
One -tailed
α = 0.05
•
Step 3: Collect your data &
“Generic” statistical test
An example: One sample z-test
μTreatment ≥ μpop = 60
Memory example experiment:
• We give a n = 16 memory patients
a memory improvement treatment.
• After the treatment they have an
average score of M = 55 memory errors.
• How do they compare to the general
population of memory patients who have
a distribution of memory errors that is
Normal, μ = 60, σ = 8?
μTreatment < μpop = 60
α = 0.05
One -tailed
•
Step 3: Collect your data &
compute your test statistics
zM =
M - mM
sM
= -2.5
=
55 - 60
æ8
ö
ç
÷
è 16 ø
“Generic” statistical test
An example: One sample z-test
Memory example experiment:
• We give a n = 16 memory patients
a memory improvement treatment.
μTreatment ≥ μpop = 60
μTreatment < μpop = 60
One -tailed
α = 0.05
zM = -2.5
• Step 4: Make a decision
• After the treatment they have an
about your null hypothesis
average score of M = 55 memory errors.
• How do they compare to the general
population of memory patients who have 5%
a distribution of memory errors that is
Normal, μ = 60, σ = 8?
Reject H0
“Generic” statistical test
An example: One sample z-test
Memory example experiment:
• We give a n = 16 memory patients
a memory improvement treatment.
μTreatment ≥ μpop = 60
μTreatment < μpop = 60
One -tailed
α = 0.05
zM = -2.5
• Step 4: Make a decision
• After the treatment they have an
about your null hypothesis
average score of μ = 55 memory errors.
- Reject H0
• How do they compare to the general
- Support for our HA, the
population of memory patients who have
evidence suggests that the
a distribution of memory errors that is
treatment decreases the
Normal, μ = 60, σ = 8?
number of memory errors