Central Limit Theorem For any population with a mean and standard

Comp2100-3100 Lecture 3 Summary
Comp2/3100 Lecture-3 Notes (Quantitative Methods 2)
Introduction
So far we have considered situations where the sample has been a single score, such as
the score on an OOP exam or a comparison of scores for a Networks and Web Design
Modules. In either case, the z-test using a single score was successful in giving us some
confidence about the meaning of individual scores.
Most research studies use a sample of the population, where the sample could comprise
the scores of say 25 people. So in an educational setting, we take 25 pupils, give them a
test and compute their mean (average score), then we get them to do some educational
activity and we test them again to get their new mean score. We are interested to know if
any difference in the mean is a result of the educational activity, and has not arisen by
chance.
Also, the z-test assumed that the underlying data was normally distributed. This is the
“bell-shaped” distribution we have seen last week (using the “magic” excel spreadsheet.
The normal distribution is important for two reasons: First it has a mathematical
description, so we can use maths to prove some properties about it (outside the scope of
this module). Second, sample data, under certain circumstances may “approximate” the
normal distribution quite closely, as we shall see below. This is more interesting for us,
since we can apply the ideas investigated using the z-test approach to a wide range of
research problems.
Distribution of the Means
Let’s consider a population comprising just four scores 2,4,6,8. Their frequency
distribution is shown below. It is clear that this distribution is not normal (it’s flat and not
“bell-shaped”). The mean of this population is (2 + 4 + 6 + 8)/4 = 20/4 = 5.
0
1
X
2
3
X
4
5
X
6
7
X
8
9
Now let’s take all possible samples with n=2 in other words all possible samples of pairs
of scores. We also agree to use random sampling where each individual sample is
replaced into the data set. We compute the averages of all sample pairs. So, for example,
we get average(2 + 4) = 3 and average(4 + 2) = 3. We get average(2 + 2) = 2 and so on.
When we plot the distribution of averages then an interesting plot emerges, show below
Comp2100-3100 Lecture 3 Summary
0
X
2
1
X
X
3
X
X
X
4
X
X
X
X
5
X
X
X
6
X
X
7
X
8
9
Something interesting has happened. First we have generated a normal distribution of
means (or close) from a non-normal distribution of scores. Second, the mean value of the
means is the same as the mean of the population. So it looks as though, through the
process of sampling, we are able to discover the population mean. This is a very
important result, and is the bed-rock of statistical analysis. It also has a mathematical
description, which is summarized in the “Central Limit Theorem”
Central Limit Theorem For any population with a mean  and standard deviation  ,
the distribution of sample means for sample size n will have a mean of  and standard
deviation of 
n
and will approach a normal distribution as n gets very large.
So the distribution of sample means (above 5 from the distribution) is the same as the
population mean (above 5 from the calculation on the scores). But as the sample size gets
large, then the distribution of the means will approach the normal distribution.
So this means that the z-test approach is applicable to data which is not normally
distributed, provided that we take samples, and calculate their means, when the sample
size is big enough. How big? Well n=30 will usually do.
The above formula is used to define what is called the “standard error” of the sample
mean. We label the sample data as X and we label the mean of each sample as X . Then
the standard error becomes
X 

n
The standard error is important since it identifies how much the observed sample mean
X differs from the un-measurable population mean  . So to be more confident that our
sample mean is a good measure of the population mean, then the standard error should be
small. One way we can ensure this is to take large samples.
SAT Scores Example. Let’s take an example of US SATs scores. The population of
SATs scores is normal with   500,   100 . What is the chance that a sample of n=25
students has a mean score X  540 ? Since the distribution is normal, we can use the ztest. We need to calculate the following z-score
Comp2100-3100 Lecture 3 Summary
z
X 
X
where we have comparing the sample mean X with the population mean  as a number
of standard errors (dividing by  X ).
First we must calculate the standard error  X 
z
X 
X


n

100
 20 . Then we can calculate z.
5
540  500
2
20
So once again, we find the z-value is 2, meaning that around 98% of the sample means
are below this and only 2% are above. So we conclude that the chance of getting a sample
mean of 540 is 2%, so we are 98% confident that this sample mean, if recorded in an
experiment is false.
The t-statistic
Perhaps you have noticed something a little strange, even worrying. The above
calculations have referred to parameters (means and sd) of populations. But the whole
point of sampling is to deduce something about the population, since we do not normally
know its parameters! So, you make a change to a web-browser and you take a sample of
people and question them. You hope the results of your research will allow you to make a
conclusion beyond the people in your sample, in other words to generalize to a
population. That’s where the t-test comes in.
The problem concerns the standard deviation. Let’s wee what it is and how to fix it up.
Let’s start with the formula for sd which we have seen previously,

( x   ) 2

N
SS
N
where SS means “sum of squares”. This is fine for a population of N, but not for a
sample of n. The formula for a sample of n is

( x  X ) 2
SS

n 1
n 1
So where does the (n – 1) come from? Think about five people entering a cafeteria where
there are five remaining items, a hot-dog, a waffle, an apple, a buttie and crisps. We
observe the people choosing their snack. How many observations can be freely made?
Comp2100-3100 Lecture 3 Summary
Well the first 4 people have a free (but increasingly restricted) choice, but number five
has no choice, only the buttie remains. So there are (5 – 1) choices. The same with our
sample of n scores, we only can freely choose n – 1. This is called the “degrees of
freedom” which I shall abbreviate as df.
So this is the value of sd that we use in the computation of the standard error
X 

n
So we could re-write our z-value as
z
X 
 n
but this has not really helped, since both  and  refer to the population. What we do
next is to substitute  , the sd of the population with s which is the sd of the sample. This
leads to the formulation of the t-statistic
t
X 
s n
where we emphasize that s is the standard deviation of the sample. Note that the
population mean  is still present in this formula, though we shall see how to get rid of it
soon.
Hypothesis Testing -1Let’s put this all to work in a typical research scenario. We take a sample of computer
game-players and make an intervention, the inclusion of rich visual graphical elements
into the game. We wish to see the effect of this intervention, how do the visuals effect the
behaviour of the players? Our experimental design is sketched below. We make a small
game level containing two rooms, A has lots of visuals and B is rather bland. We take a
sample of n = 16 players and put them into the game level for 60 minutes and record the
time they spend in room B.
Comp2100-3100 Lecture 3 Summary
B
A
The results of the experiment are as follows. We find that the average time spent in B
X  39 minutes, and the observed “sum of squares” for the sample is SS = 540. We
proceed in four stages (which we shall use for all of our hypothesis tests)
STAGE 1: Formulation of the Hypothesis.
H 0 : Here we formulate the “null” hypothesis, that the visuals have no effect on the
behaviour.
H1 : Here we formulate the “alternative” hypothesis, that the visuals do have an effect on
the players’ behaviour.
The null hypothesis is crucial, since it helps us to “get rid” of the population parameter
 . Think about it. If visuals have no effect on the population then what is the average
time the player will spend in room B? Clearly half the time, so we have inferred from the
null hypothesis that   30 .
STAGE 2: Locate the Critical Region of the t-statistic where we can reject or accept
the null hypothesis. This is done using the t-table. We need to input the number of
degrees of freedom, and the level of significance of confidence we require. We take (as
typical)   0.05 for the significance, and we calculate df = 16 – 1 = 15. Looking this up
in the t-table yields a critical value of t=-2.131, t=2.131. Remember that if our sample tvalue is greater (respectively smaller) than these ranges, then it is highly unlikely that the
associated sample means have been produced by chance and not by a real effect.
STAGE 3: Calculate the statistic. First we calculate the sample sd:
s
SS
540

6
n 1
15
Then the sample standard error
sX 
s
6
  1.5
n 4
and finally the t-statistic
Comp2100-3100 Lecture 3 Summary
t
X   39  30

6
1.5
s n
Note what we have done here. We have inserted the observed mean time in room B (39
seconds) and the standard error calculated from the observed sum of squares. But what
about the population mean  , where did the value of 30 come from? Well it came from
the null hypothesis where we concluded that if visuals had no effect, then the player
would spend 30 minutes in both rooms A and B.
STAGE 4. DECISION. Our calculated t = 6 falls well into the critical region, well
beyond the value of 2.313 which indicates where chance kicks in. So we reject the null
hypothesis, since most of the cases corresponding to the null hypothesis occur at t values
below 2.313. We conclude that visuals do influence player behaviour, and also, from the
fact that they spend more time in the bland room, we conclude that players prefer rooms
without visual impact!
To understand this, let’s look at a data set which has been created to provide the above
results. Here it is:
X
27
28
29
30
X
31
32
X
33
34
45
X
36
X
37
X
X
38
X
X
X
39
X
40
X
41
42
X
X
43
44
45
46
47
48
X
49
First look at the sample mean (39). Do the data values cluster around this mean? Yes.
Now look at the mean (30) corresponding to the null hypothesis. Do the data values
cluster around this value? Clearly not. Since the null hypothesis predicts a mean of 30,
and since the sample does not cluster around 30 then we are forced to reject the null
hypothesis.
Hypothesis Testing -2- (Independent Samples)
In this scenario there are two populations and two samples. We are interested to know if
there is a difference between the populations when they are subject to different
interventions. These interventions could be a different learning/teaching approach,
different medical treatments, or exposure to different web browsers or computer game
content. In all cases the scenario can be sketched as below:
50
X
51
Comp2100-3100 Lecture 3 Summary
Population
A
mean = ?
Population
B
mean = ?
Sample
A
Sample
B
Through measurement of the sample means we wish to discover if there is a difference
between the population means.
Again, the formulation of the null-hypothesis is key in this discovery, since it removes
the means  from the t-statistic calculation. Here is the null hypothesis:
H 0 : The intervention has had no effect. There is no difference between the population
means, i.e.,  A   B  0 .
Let’s take an example. A researcher wishes to discover if the use of mental imagery helps
remembering. She presents 50 noun pairs to sample A (such as dog/bike) and the same
noun pairs to sample B, but asks them to form a mental image of the noun pairs (e.g. a
dog riding a bike). Then she gives both sample groups a memory test of the noun pairs.
Her results are as follows
Sample group A: n  10, X  19, SS  40
Sample group B: n  10, X  26, SS  50
Here’s how she proceeds with the data analysis:
STAGE 1: Formation of the Hypothesis.
H 0 : The use of the mental image has had no effect. There is no difference between the
population means, i.e.,  A   B  0 .
She chooses to set the significance   0.05 .
STAGE 2: Locate the Critical Region of the t-statistic where we can reject or accept
the null hypothesis. First we must establish the degrees of freedom. For sample group A
Comp2100-3100 Lecture 3 Summary
we have (10 – 1) = 9 degrees of freedom, and the same for sample group B, so in total we
have 18 degrees of freedom, df = 18. Together with the value for alpha, looking up on the
t-table we find that the critical t is plus or minus 2.101.
STAGE 3: Calculate the Statistic. Here we proceed in the same manner as above, but
the calculations are different, since we have two samples. We calculated the “pooled”
standard deviation defined as
sP 

SS A  SS B
df1  df 2
40  50
99
 5
Then we calculate the standard error according to this formula
s X A  X B  sP
 5
1
1

nA nB
1 1

10 10
1
Finally we calculate the t – statistic, which you can clearly see is looking for the
difference between the sample means:
t
( X A  X B )  ( A  B )
sX A  X B
(19  26)  0
1
 7

Note again, how using the null hypothesis has wiped out the two population means from
the calculation, which of course we do not know.
STEP 4. The Decision. The final value of -7 is in the critical region, well below the
value of -2.101. So the chance that the null hypothesis has occurred by accident is very
small (less than 2.5%), so we reject the null hypothesis and conclude that the use of
imagery in remembering noun-pairs does have an effect. Looking at the averages, we see
the effect is positive.
Comp2100-3100 Lecture 3 Summary
Hypothesis Testing -3- (Related Samples)
… coming soon …