Worksheets for Workshop Sat 1 April 1 Vocabulary 2 Beetle – an adaptation of an activity in ‘Chance and Data’ Volume 1 3 Chocolate Chippie Biscuits – adaptation of an activity in ‘Chance and Data’ Volume 1 4 Investigating confidence intervals 5 Analysing Bivariate Data 6 Sources of inspiration – books and websites 7 Dataset descriptor PROBABILITY TERMS AND DEFINITIONS Match the terms in the left-hand column with the correct definition or formula in the right-hand column Random Experiment Trial Outcome Sample space A variable whose outcome is determined by the outcomes of the experiment. Denoted by X P(A) Gives the likelihood of different values. Often expressed using a table or formula Process, a result of which depends on chance Event Usually takes whole number values only Random Variable Probability of 0 Discrete Random Variable Between 0 and 1 inclusive Continuous Random Variable Number of favourable outcomes Total number of trials Impossible Event One performance of the experiment Certain Event A subset of the Sample Space Probability of event A occurring Can take any real number value in a certain range Range of values for a probability Number of elements in the event No. of elements in the Sample Space Experimental Probability (Long-run frequency) A result of the experiment Theoretical Probability (with equally likely outcomes) Probability of 1 Probability function Set of all possible outcomes Beetle 1 Set up a tally chart and frequency table to record dice throws on. 2 We will play Beetle with the following rules record all throws of the die on your tally chart you must draw a body first a head is needed before eyes, feelers or mouth can be drawn body parts are drawn from the following throws 6 body 5 head 4 mouth 3 leg - need six 2 eye - need two 1 feeler - need two the first person to complete a beetle is the winner! 2 When you have completed your beetle, add up how many of each number you got and calculate the relative frequency of each number. 3 Draw a bar graph showing the frequency of each number 4 Record your results on the class chart. What changes when we look at a larger sample? Draw a bar graph showing the relative frequency of each number for the class. How do these compare to the theoretical probabilities of each number? HOW MANY CHOCOLATE CHIPS MAKES A GOOD CHOCOLATE CHIPPIE BISCUIT? A manufacturers reputation and sales are very dependent on delivering what’s advertised – lots of chocolate chips – BUT economics suggest that costs (chocolate chips ) need to be kept to a minimum How many chocolate chips do we need to ensure a reasonable number per biscuit 1 How many chocolate chips are there in a bought chocolate chippie biscuit? Eat and count! 2 How could we simulate stirring 100chocolate chips into 10 cookies? Try it – what happens? 3 How many chocolate chips does each biscuit need? How many chocolate chips would we need for 10 biscuits to ensure they all (most?) get this many? 4 How could we model this using the binomial distribution? Does this change your answer to 3? Investigating confidence intervals Complete the probability distribution table for X where X 1-digit random numbers x 0 1 2 3 4 5 6 7 8 9 P(X=x) 0.1 If you were to display this probability distribution on a bar graph what shape would it have? _________________________ Calculate μ =E(X) σ= SD(X) I would therefore expect the mean of all possible 1-digit random numbers to be ____________. ### 2 What size sample do we need? Random numbers are _______________ distributed. This is a symmetric/skewed distribution, so a sample size of _________ should be sufficient to ensure that the distribution of the _________ _________ is approximately ___________. 3 Use your calculator in STATISTICS mode to obtain a sample of 15 random one-digit numbers and calculate their mean and standard deviation. Mean ________________ Standard Deviation _____________ 4 Calculate a 90% confidence interval for the mean of the random numbers you generated. 90% Confidence Interval for the mean ____________________________ Draw your confidence interval on the grid below. Add those for the rest of your group. Think carefully about scale on axes. Add your group’s confidence intervals to the class graph. 5 Conclusions Is your expected mean (#1) within your confidence interval? Yes/ No On the basis of your confidence interval what would you conclude about the randomness of random numbers? *Is there anyone in your group whose expected mean was not within their confidence interval? *What percentage of the class confidence intervals would you expect to enclose the expected (population) mean? ___________ Number of class confidence intervals _________ Number of class confidence intervals enclosing the expected mean ______ Percentage of class confidence intervals enclosing the expected mean ______ *What do you conclude about the randomness random numbers? Adaption to - Central Limit Theorem Investigation Suppose we want a 1 digit random number. Complete the probability distribution table for X where X=a 1-digit random number x 0 1 2 3 4 5 6 7 8 9 P(X=x) 0.1 If you were to display this probability distribution on a bar graph what shape would it have? __________________ Calculate E(X) SD(X) 2 Take a random sample of 20 1-digit random numbers. Calculate the mean of your sample. (round your answer to 1dp) Repeat this 4 times, so that you have 5 different random samples and the mean of each one. 3 Record your 5 sample means in the appropriate space on the unordered stem and leaf plot on the white board. 4 What do you notice about the shape of the stem and leaf plot Extension Calculate the mean and standard deviation of all the sample means. Analysing Bi-variate data (The data in this exercise has been taken from: Moore,D & McCabe,G, (2003) Introduction to the Practice of Statistics, 4TH Edn, Freeman. p148) The problem: Keeping water supplies clean requires regular measurement of levels of pollutants. The measurements are indirect – a typical analysis involves forming a dye by a chemical reaction with the dissolved pollutant, then passing a light through the solution and measuring its “absorbance”. To calibrate such measurements the laboratory measures known standard solutions and uses regression to relate absorbance to pollutant concentration. This is usually done every day. Here is one set of data on the absorbance for different levels of Nitrates. Nitrates are measure in milligrams per litre of water. Nitrate (mg/litre water) Absorbance 50 50 100 200 400 800 1200 7.0 7.5 12.8 24.0 47.0 93.0 138.0 1600 2000 2000 183.0 230.0 226.0 1. Plotted on the scatter diagram below are the data from all nitrates except 800 and 1600. Complete the scatter diagram by plotting the points corresponding to these two nitrates. Scatter Plot of Nitrate Absorbance 250 Absrobance 200 150 100 50 0 0 500 1000 1500 2000 2500 Nitrate(mg/litre water) 2. Examine the above scatter diagram and describe the nature of the relationship between Absorbance and Nitrate. 3. Which statistical technique can be used to model the relationship between income and test score? 4. Figure 1 below is a screen dump of the output from a simple linear regression (SLR) analysis of the data. Examine its contents carefully, and then complete the SLR dialogue box in the screen dump in Figure 2 on the next page. Figure 1. Screen dump of output from simple linear regression on data. Figure 2. Screen dump of worksheet containing data and PHStat SLR dialogue box. 5. What is R2 for this data? Write a sentence to explain what it means. 6. Calculate the correlation between nitrates and absorbance. Note (use the information in Figure 1.) r = ________________ Is this what you expected? Explain. 7. Using the information contained in Figure 1, write down the least-squares regression line for this data. 8. Draw the least-squares regression line on the scatter diagram. 9. Complete the following table by calculating the predicted absorbances and corresponding residuals for nitrates 800 and 1600. (note: the values given where calculated on Excel –ie with no intermediate rounding) Nitrate Absorbance absorbance 50 7.0 50 7.5 100 12.8 Predicted Absorbance (mg/litre water) 200 400 800 1200 1600 2000 2000 24.0 47.0 93.0 138.0 183.0 230.0 226.0 7.32 7.32 12.99 Residual -0.32 0.178 -0.19 24.31 46.98 -0.32 0.02 137.62 0.38 228.26 228.26 1.74 2.26 10. Plot the residuals for nitrates 800 and 2000 on the residual plot below. Nitrates Residual Plot 11. Residuals 2 1 0 -1 0 500 1000 1500 -2 -3 Nitrates 2000 2500 Considering all the above information discuss whether you consider this to be an appropriate model to use. 12. In the context of this particular problem, interpret: a) the slope b) the intercept 13. Discuss the dangers of predicting beyond the range of the data. Some of my favourite sources of fresh inspiration: Books Lovatt, C., Lowe, C., (1993) Chance and Data Investigations Volumes 1 and 2, Curriculum Corporation Not new and often geared to younger students but I find they are a great starting point for interesting ways to introduce and explore new (or ‘old’!) topics. Garfield J (ed) (2005) Innovations in Teaching Statistics , The Mathematical Association of America. “This is a book of stories about teaching statistics” I’ve only just found this bk – it appears to be mainly about teaching ‘Introductory Statistics courses in University – tho that means some is applicable to Year13. Gelman,A. Nolan, D (2002), Teaching Statistics – a bag of tricks Oxfod University Press. Loads of ideas – includes both demonstrations and project type stuffgeared at developing ‘thinking’!! Green,D. (Ed) (1994) Teaching Statistics at its Best, Teaching Statistics Trust A compilation of of the best articles in Teaching Statistics (Journal) from volumes 6-14. – some interesting stuff – but not much that’s both ‘new and useful’ – the good ideas have been well shared around in the intervening years Hawkins, A., Joliffe, F., Glickman, L., (1992) Teaching Statistical Concepts. Can be a bit heavy in parts – but has some good ideas Journals Journal of Statistics Education http://www.amstat.org/publications/jse/ More geared to university teaching – but ‘free’ on line Mathematics in Schools - published by the Mathematical Association (UK) Some articles are available on line at: http://www.ma.org.uk/resources/periodicals/online_articles_keyword_index/index.html Teaching Statistics - to view recent copies need to subscribe – but some articles can be viewed – along with contents of current journals at http://www.rsscse.org.uk/ts/ mostly geared to teaching in schools Websites **Use http://www.stat.auckland.ac.nz/~u47510x/teachers/index6.php My favourite for data with a story is DASL http://lib.stat.cmu.edu/DASL/ My favourite - Simulations: http://www.ruf.rice.edu/~lane/stat_sim/index.html I especially like the Central limit theorem one (there is a worksheet to go with this on ** Data set included in excel file – from DASL (see web address above) Datafile Name: Chromatography Datafile Subjects: Science Story Names: Chromatography Reference: Moore, David S., and George P. McCabe (1989). Introduction to the Practice of Statistics. Original source: Kurtz, D.A. (ed), Trace Residue Analysis, American Chemical Society Symposium Series No. 284, 1985, Appendix. Authorization: Description: Results of a study of gas chromatography, a technique which is used to detect very small amounts of a substance. Five measurements were taken for each of four specimens containing different amounts of the substance. The amount of the substance in each specimen was determined before the experiment. The response variable is the output reading from the gas chromatograph. The purpose of the study is to calibrate the chromatograph by relating the actual amount of the substance to the chromatograph reading. Number of cases: 20 Variable Names: 1. amount: amount of substance in the specimen (nanograms) 2. response: output reading from the gas chromatograph Abstract: Results of a study of gas chromatography, a technique which is used to detect very small amounts of a substance. Five measurements were taken for each of four specimens containing different amounts of the substance. The amount of the substance in each specimen was determined before the experiment. The response variable is the output reading from the gas chromatograph. The purpose of the study is to calibrate the chromatograph by relating the actual amount of the substance to the chromatograph reading. Figure 1 shows a plot of response vs. amount with a regression line superimposed. Note the large range of x-values. The regression of amount on response has an Rsquare of 99.9%. However, despite this, the plot shows that the regression line passes through the data only for the largest amount tested. The responses for other amounts either lie completely above or below the line. Figure 2 shows this more clearly, and also shows that the variability of the residuals increases as the predicted values increase. These are failures of the regression assumptions of linearity and constant variance. They make the results of the regression suspect. Despite appearances, the data do not seem linear. Perhaps transforming the data would provide a better fit.
© Copyright 2025 Paperzz