Comp2100-3100 Lecture 3 Summary Comp2/3100 Lecture-3 Notes (Quantitative Methods 2) Introduction So far we have considered situations where the sample has been a single score, such as the score on an OOP exam or a comparison of scores for a Networks and Web Design Modules. In either case, the z-test using a single score was successful in giving us some confidence about the meaning of individual scores. Most research studies use a sample of the population, where the sample could comprise the scores of say 25 people. So in an educational setting, we take 25 pupils, give them a test and compute their mean (average score), then we get them to do some educational activity and we test them again to get their new mean score. We are interested to know if any difference in the mean is a result of the educational activity, and has not arisen by chance. Also, the z-test assumed that the underlying data was normally distributed. This is the “bell-shaped” distribution we have seen last week (using the “magic” excel spreadsheet. The normal distribution is important for two reasons: First it has a mathematical description, so we can use maths to prove some properties about it (outside the scope of this module). Second, sample data, under certain circumstances may “approximate” the normal distribution quite closely, as we shall see below. This is more interesting for us, since we can apply the ideas investigated using the z-test approach to a wide range of research problems. Distribution of the Means Let’s consider a population comprising just four scores 2,4,6,8. Their frequency distribution is shown below. It is clear that this distribution is not normal (it’s flat and not “bell-shaped”). The mean of this population is (2 + 4 + 6 + 8)/4 = 20/4 = 5. 0 1 X 2 3 X 4 5 X 6 7 X 8 9 Now let’s take all possible samples with n=2 in other words all possible samples of pairs of scores. We also agree to use random sampling where each individual sample is replaced into the data set. We compute the averages of all sample pairs. So, for example, we get average(2 + 4) = 3 and average(4 + 2) = 3. We get average(2 + 2) = 2 and so on. When we plot the distribution of averages then an interesting plot emerges, show below Comp2100-3100 Lecture 3 Summary 0 X 2 1 X X 3 X X X 4 X X X X 5 X X X 6 X X 7 X 8 9 Something interesting has happened. First we have generated a normal distribution of means (or close) from a non-normal distribution of scores. Second, the mean value of the means is the same as the mean of the population. So it looks as though, through the process of sampling, we are able to discover the population mean. This is a very important result, and is the bed-rock of statistical analysis. It also has a mathematical description, which is summarized in the “Central Limit Theorem” Central Limit Theorem For any population with a mean and standard deviation , the distribution of sample means for sample size n will have a mean of and standard deviation of n and will approach a normal distribution as n gets very large. So the distribution of sample means (above 5 from the distribution) is the same as the population mean (above 5 from the calculation on the scores). But as the sample size gets large, then the distribution of the means will approach the normal distribution. So this means that the z-test approach is applicable to data which is not normally distributed, provided that we take samples, and calculate their means, when the sample size is big enough. How big? Well n=30 will usually do. The above formula is used to define what is called the “standard error” of the sample mean. We label the sample data as X and we label the mean of each sample as X . Then the standard error becomes X n The standard error is important since it identifies how much the observed sample mean X differs from the un-measurable population mean . So to be more confident that our sample mean is a good measure of the population mean, then the standard error should be small. One way we can ensure this is to take large samples. SAT Scores Example. Let’s take an example of US SATs scores. The population of SATs scores is normal with 500, 100 . What is the chance that a sample of n=25 students has a mean score X 540 ? Since the distribution is normal, we can use the ztest. We need to calculate the following z-score Comp2100-3100 Lecture 3 Summary z X X where we have comparing the sample mean X with the population mean as a number of standard errors (dividing by X ). First we must calculate the standard error X z X X n 100 20 . Then we can calculate z. 5 540 500 2 20 So once again, we find the z-value is 2, meaning that around 98% of the sample means are below this and only 2% are above. So we conclude that the chance of getting a sample mean of 540 is 2%, so we are 98% confident that this sample mean, if recorded in an experiment is false. The t-statistic Perhaps you have noticed something a little strange, even worrying. The above calculations have referred to parameters (means and sd) of populations. But the whole point of sampling is to deduce something about the population, since we do not normally know its parameters! So, you make a change to a web-browser and you take a sample of people and question them. You hope the results of your research will allow you to make a conclusion beyond the people in your sample, in other words to generalize to a population. That’s where the t-test comes in. The problem concerns the standard deviation. Let’s wee what it is and how to fix it up. Let’s start with the formula for sd which we have seen previously, ( x ) 2 N SS N where SS means “sum of squares”. This is fine for a population of N, but not for a sample of n. The formula for a sample of n is ( x X ) 2 SS n 1 n 1 So where does the (n – 1) come from? Think about five people entering a cafeteria where there are five remaining items, a hot-dog, a waffle, an apple, a buttie and crisps. We observe the people choosing their snack. How many observations can be freely made? Comp2100-3100 Lecture 3 Summary Well the first 4 people have a free (but increasingly restricted) choice, but number five has no choice, only the buttie remains. So there are (5 – 1) choices. The same with our sample of n scores, we only can freely choose n – 1. This is called the “degrees of freedom” which I shall abbreviate as df. So this is the value of sd that we use in the computation of the standard error X n So we could re-write our z-value as z X n but this has not really helped, since both and refer to the population. What we do next is to substitute , the sd of the population with s which is the sd of the sample. This leads to the formulation of the t-statistic t X s n where we emphasize that s is the standard deviation of the sample. Note that the population mean is still present in this formula, though we shall see how to get rid of it soon. Hypothesis Testing -1Let’s put this all to work in a typical research scenario. We take a sample of computer game-players and make an intervention, the inclusion of rich visual graphical elements into the game. We wish to see the effect of this intervention, how do the visuals effect the behaviour of the players? Our experimental design is sketched below. We make a small game level containing two rooms, A has lots of visuals and B is rather bland. We take a sample of n = 16 players and put them into the game level for 60 minutes and record the time they spend in room B. Comp2100-3100 Lecture 3 Summary B A The results of the experiment are as follows. We find that the average time spent in B X 39 minutes, and the observed “sum of squares” for the sample is SS = 540. We proceed in four stages (which we shall use for all of our hypothesis tests) STAGE 1: Formulation of the Hypothesis. H 0 : Here we formulate the “null” hypothesis, that the visuals have no effect on the behaviour. H1 : Here we formulate the “alternative” hypothesis, that the visuals do have an effect on the players’ behaviour. The null hypothesis is crucial, since it helps us to “get rid” of the population parameter . Think about it. If visuals have no effect on the population then what is the average time the player will spend in room B? Clearly half the time, so we have inferred from the null hypothesis that 30 . STAGE 2: Locate the Critical Region of the t-statistic where we can reject or accept the null hypothesis. This is done using the t-table. We need to input the number of degrees of freedom, and the level of significance of confidence we require. We take (as typical) 0.05 for the significance, and we calculate df = 16 – 1 = 15. Looking this up in the t-table yields a critical value of t=-2.131, t=2.131. Remember that if our sample tvalue is greater (respectively smaller) than these ranges, then it is highly unlikely that the associated sample means have been produced by chance and not by a real effect. STAGE 3: Calculate the statistic. First we calculate the sample sd: s SS 540 6 n 1 15 Then the sample standard error sX s 6 1.5 n 4 and finally the t-statistic Comp2100-3100 Lecture 3 Summary t X 39 30 6 1.5 s n Note what we have done here. We have inserted the observed mean time in room B (39 seconds) and the standard error calculated from the observed sum of squares. But what about the population mean , where did the value of 30 come from? Well it came from the null hypothesis where we concluded that if visuals had no effect, then the player would spend 30 minutes in both rooms A and B. STAGE 4. DECISION. Our calculated t = 6 falls well into the critical region, well beyond the value of 2.313 which indicates where chance kicks in. So we reject the null hypothesis, since most of the cases corresponding to the null hypothesis occur at t values below 2.313. We conclude that visuals do influence player behaviour, and also, from the fact that they spend more time in the bland room, we conclude that players prefer rooms without visual impact! To understand this, let’s look at a data set which has been created to provide the above results. Here it is: X 27 28 29 30 X 31 32 X 33 34 45 X 36 X 37 X X 38 X X X 39 X 40 X 41 42 X X 43 44 45 46 47 48 X 49 First look at the sample mean (39). Do the data values cluster around this mean? Yes. Now look at the mean (30) corresponding to the null hypothesis. Do the data values cluster around this value? Clearly not. Since the null hypothesis predicts a mean of 30, and since the sample does not cluster around 30 then we are forced to reject the null hypothesis. Hypothesis Testing -2- (Independent Samples) In this scenario there are two populations and two samples. We are interested to know if there is a difference between the populations when they are subject to different interventions. These interventions could be a different learning/teaching approach, different medical treatments, or exposure to different web browsers or computer game content. In all cases the scenario can be sketched as below: 50 X 51 Comp2100-3100 Lecture 3 Summary Population A mean = ? Population B mean = ? Sample A Sample B Through measurement of the sample means we wish to discover if there is a difference between the population means. Again, the formulation of the null-hypothesis is key in this discovery, since it removes the means from the t-statistic calculation. Here is the null hypothesis: H 0 : The intervention has had no effect. There is no difference between the population means, i.e., A B 0 . Let’s take an example. A researcher wishes to discover if the use of mental imagery helps remembering. She presents 50 noun pairs to sample A (such as dog/bike) and the same noun pairs to sample B, but asks them to form a mental image of the noun pairs (e.g. a dog riding a bike). Then she gives both sample groups a memory test of the noun pairs. Her results are as follows Sample group A: n 10, X 19, SS 40 Sample group B: n 10, X 26, SS 50 Here’s how she proceeds with the data analysis: STAGE 1: Formation of the Hypothesis. H 0 : The use of the mental image has had no effect. There is no difference between the population means, i.e., A B 0 . She chooses to set the significance 0.05 . STAGE 2: Locate the Critical Region of the t-statistic where we can reject or accept the null hypothesis. First we must establish the degrees of freedom. For sample group A Comp2100-3100 Lecture 3 Summary we have (10 – 1) = 9 degrees of freedom, and the same for sample group B, so in total we have 18 degrees of freedom, df = 18. Together with the value for alpha, looking up on the t-table we find that the critical t is plus or minus 2.101. STAGE 3: Calculate the Statistic. Here we proceed in the same manner as above, but the calculations are different, since we have two samples. We calculated the “pooled” standard deviation defined as sP SS A SS B df1 df 2 40 50 99 5 Then we calculate the standard error according to this formula s X A X B sP 5 1 1 nA nB 1 1 10 10 1 Finally we calculate the t – statistic, which you can clearly see is looking for the difference between the sample means: t ( X A X B ) ( A B ) sX A X B (19 26) 0 1 7 Note again, how using the null hypothesis has wiped out the two population means from the calculation, which of course we do not know. STEP 4. The Decision. The final value of -7 is in the critical region, well below the value of -2.101. So the chance that the null hypothesis has occurred by accident is very small (less than 2.5%), so we reject the null hypothesis and conclude that the use of imagery in remembering noun-pairs does have an effect. Looking at the averages, we see the effect is positive. Comp2100-3100 Lecture 3 Summary Hypothesis Testing -3- (Related Samples) … coming soon …
© Copyright 2026 Paperzz