Document

Worksheets for Workshop Sat 1 April
1
Vocabulary
2
Beetle – an adaptation of an activity in ‘Chance and Data’ Volume 1
3
Chocolate Chippie Biscuits –
adaptation of an activity in ‘Chance and Data’ Volume 1
4
Investigating confidence intervals
5
Analysing Bivariate Data
6
Sources of inspiration – books and websites
7
Dataset descriptor
PROBABILITY TERMS AND DEFINITIONS
Match the terms in the left-hand column with the correct definition or
formula in the right-hand column
Random Experiment
Trial
Outcome
Sample space
A variable whose outcome is
determined by the outcomes of the
experiment. Denoted by X
P(A)
Gives the likelihood of different
values. Often expressed using a
table or formula
Process, a result of which depends
on chance
Event
Usually takes whole number values
only
Random Variable
Probability of 0
Discrete Random Variable
Between 0 and 1 inclusive
Continuous Random Variable
Number of favourable outcomes
Total number of trials
Impossible Event
One performance of the
experiment
Certain Event
A subset of the Sample Space
Probability of event A occurring
Can take any real number value in a
certain range
Range of values for a probability
Number of elements in the event
No. of elements in the Sample Space
Experimental Probability (Long-run
frequency)
A result of the experiment
Theoretical Probability (with
equally likely outcomes)
Probability of 1
Probability function
Set of all possible outcomes
Beetle
1
Set up a tally chart and frequency table to record dice throws on.
2
We will play





Beetle with the following rules
record all throws of the die on your tally chart
you must draw a body first
a head is needed before eyes, feelers or mouth can be
drawn
body parts are drawn from the following throws
6
body
5
head
4
mouth
3
leg - need six
2
eye - need two
1
feeler - need two
the first person to complete a beetle is the winner!
2
When you have completed your beetle, add up how many of each
number you got and calculate the relative frequency of each number.
3
Draw a bar graph showing the frequency of each number
4
Record your results on the class chart.
What changes when we look at a larger sample?
Draw a bar graph showing the relative frequency of each
number for the class.
How do these compare to the theoretical probabilities of
each number?
HOW MANY CHOCOLATE CHIPS MAKES A GOOD
CHOCOLATE CHIPPIE BISCUIT?
A manufacturers reputation and sales are very dependent on
delivering what’s advertised – lots of chocolate chips – BUT economics
suggest that costs (chocolate chips ) need to be kept to a minimum
How many chocolate chips do we need to ensure a
reasonable number per biscuit
1 How many chocolate chips are there in a bought chocolate chippie
biscuit? Eat and count!
2 How could we simulate stirring 100chocolate chips into 10 cookies?
Try it – what happens?
3 How many chocolate chips does each biscuit need? How many
chocolate chips would we need for 10 biscuits to ensure they all (most?)
get this many?
4 How could we model this using the binomial distribution?
Does this change your answer to 3?
Investigating confidence intervals
Complete the probability distribution table for X where X 1-digit random
numbers
x
0
1
2
3
4
5
6
7
8
9
P(X=x) 0.1
If you were to display this probability distribution on a bar graph what
shape would it have? _________________________
Calculate μ =E(X)
σ= SD(X)
I would therefore expect the mean of all possible 1-digit random numbers
to be ____________.
###
2
What size sample do we need?
Random numbers are _______________ distributed. This is a
symmetric/skewed distribution, so a sample size of _________ should be
sufficient to ensure that the distribution of the _________ _________ is
approximately ___________.
3 Use your calculator in STATISTICS mode to obtain a sample of 15
random one-digit numbers and calculate their mean and standard
deviation.
Mean ________________
Standard Deviation _____________
4 Calculate a 90% confidence interval for the mean of the random
numbers you generated.
90% Confidence Interval for the mean ____________________________
Draw your confidence interval on the grid below. Add those for the rest of
your group. Think carefully about scale on axes.
Add your group’s confidence intervals to the class graph.
5 Conclusions
Is your expected mean (#1) within your confidence interval? Yes/ No
On the basis of your confidence interval what would you conclude about
the randomness of random numbers?
*Is there anyone in your group whose expected mean was not within their
confidence interval?
*What percentage of the class confidence intervals would you expect to
enclose the expected (population) mean? ___________
Number of class confidence intervals _________
Number of class confidence intervals enclosing the expected mean ______
Percentage of class confidence intervals enclosing the expected mean
______
*What do you conclude about the randomness random numbers?
Adaption to - Central Limit Theorem Investigation
Suppose we want a 1 digit random number.
Complete the probability distribution table for X where X=a 1-digit random
number
x
0
1
2
3
4
5
6
7
8
9
P(X=x) 0.1
If you were to display this probability distribution on a bar graph what
shape would it have? __________________
Calculate E(X)
SD(X)
2 Take a random sample of 20 1-digit random numbers.
Calculate the mean of your sample. (round your answer to 1dp)
Repeat this 4 times, so that you have 5 different random samples and
the mean of each one.
3 Record your 5 sample means in the appropriate space on the
unordered stem and leaf plot on the white board.
4 What do you notice about the shape of the stem and leaf plot
Extension
Calculate the mean and standard deviation of all the sample means.
Analysing Bi-variate data
(The data in this exercise has been taken from: Moore,D & McCabe,G,
(2003) Introduction to the Practice of Statistics, 4TH Edn, Freeman. p148)
The problem: Keeping water supplies
clean requires regular measurement of
levels
of
pollutants.
The
measurements are indirect – a typical
analysis involves forming a dye by a
chemical reaction with the dissolved
pollutant, then passing a light through
the solution and measuring its
“absorbance”.
To calibrate such
measurements
the
laboratory
measures known standard solutions
and uses regression to relate
absorbance to pollutant concentration.
This is usually done every day. Here
is one set of data on the absorbance
for different levels of Nitrates. Nitrates
are measure in milligrams per litre of water.
Nitrate
(mg/litre water)
Absorbance
50
50
100
200
400
800
1200
7.0
7.5
12.8
24.0
47.0
93.0
138.0
1600
2000
2000
183.0
230.0
226.0
1. Plotted on the scatter diagram below are the data from all nitrates except
800 and 1600. Complete the scatter diagram by plotting the points
corresponding to these two nitrates.
Scatter Plot of Nitrate Absorbance
250
Absrobance
200
150
100
50
0
0
500
1000
1500
2000
2500
Nitrate(mg/litre water)
2. Examine the above scatter diagram and describe the nature of the
relationship between Absorbance and Nitrate.
3. Which statistical technique can be used to model the relationship between
income and test score?
4. Figure 1 below is a screen dump of the output from a simple linear
regression (SLR) analysis of the data.
Examine its contents carefully, and then complete the SLR dialogue box
in the screen dump in Figure 2 on the next page.
Figure 1. Screen dump of output from simple linear regression on data.
Figure 2. Screen dump of worksheet containing data and PHStat SLR dialogue box.
5. What is R2 for this data? Write a sentence to explain what it means.
6. Calculate the correlation between nitrates and absorbance. Note (use the
information in Figure 1.)
r = ________________
Is this what you expected? Explain.
7. Using the information contained in Figure 1, write down the least-squares
regression line for this data.
8. Draw the least-squares regression line on the scatter diagram.
9. Complete the following table by calculating the predicted absorbances and
corresponding residuals for nitrates 800 and 1600. (note: the values given
where calculated on Excel –ie with no intermediate rounding)
Nitrate
Absorbance
absorbance
50
7.0
50
7.5
100
12.8
Predicted Absorbance
(mg/litre water)
200
400
800
1200
1600
2000
2000
24.0
47.0
93.0
138.0
183.0
230.0
226.0
7.32
7.32
12.99
Residual
-0.32
0.178
-0.19
24.31
46.98
-0.32
0.02
137.62
0.38
228.26
228.26
1.74
2.26
10. Plot the residuals for nitrates 800 and 2000 on the residual plot below.
Nitrates Residual Plot
11.
Residuals
2
1
0
-1 0
500
1000
1500
-2
-3
Nitrates
2000
2500
Considering all the above information discuss whether you consider this
to be an appropriate model to use.
12. In the context of this particular problem, interpret:
a) the slope
b) the intercept
13.
Discuss the dangers of predicting beyond the range of the data.
Some of my favourite sources of fresh inspiration:
Books
Lovatt, C., Lowe, C., (1993) Chance and Data Investigations Volumes
1 and 2, Curriculum Corporation
Not new and often geared to younger students but I find they
are a great starting point for interesting ways to introduce and explore new (or
‘old’!) topics.
Garfield J (ed) (2005) Innovations in Teaching Statistics , The Mathematical
Association of America.
“This is a book of stories about teaching statistics” I’ve only just found
this bk – it appears to be mainly about teaching ‘Introductory Statistics
courses in University – tho that means some is applicable to Year13.
Gelman,A. Nolan, D (2002), Teaching Statistics – a bag of tricks Oxfod
University Press.
Loads of ideas – includes both demonstrations and project type stuffgeared at developing ‘thinking’!!
Green,D. (Ed) (1994) Teaching Statistics at its Best, Teaching Statistics Trust
A compilation of of the best articles in Teaching Statistics (Journal)
from volumes 6-14. – some interesting stuff – but not much that’s both ‘new
and useful’ – the good ideas have been well shared around in the intervening
years
Hawkins, A., Joliffe, F., Glickman, L., (1992) Teaching Statistical Concepts.
Can be a bit heavy in parts – but has some good ideas
Journals
Journal of Statistics Education http://www.amstat.org/publications/jse/
More geared to university teaching – but ‘free’ on line
Mathematics in Schools - published by the Mathematical Association (UK)
Some articles are available on line at:
http://www.ma.org.uk/resources/periodicals/online_articles_keyword_index/index.html
Teaching Statistics - to view recent copies need to subscribe – but some
articles can be viewed – along with contents of current journals at
http://www.rsscse.org.uk/ts/
mostly geared to teaching in schools
Websites
**Use http://www.stat.auckland.ac.nz/~u47510x/teachers/index6.php
My favourite for data with a story is DASL http://lib.stat.cmu.edu/DASL/
My favourite - Simulations: http://www.ruf.rice.edu/~lane/stat_sim/index.html
I especially like the Central limit theorem one (there is a worksheet to
go with this on **
Data set included in excel file – from DASL (see web address above)
Datafile Name: Chromatography
Datafile Subjects: Science
Story Names: Chromatography
Reference: Moore, David S., and George P. McCabe (1989). Introduction to the
Practice of Statistics. Original source: Kurtz, D.A. (ed), Trace Residue Analysis,
American Chemical Society Symposium Series No. 284, 1985, Appendix.
Authorization:
Description: Results of a study of gas chromatography, a technique which is used to
detect very small amounts of a substance. Five measurements were taken for each of
four specimens containing different amounts of the substance. The amount of the
substance in each specimen was determined before the experiment. The response
variable is the output reading from the gas chromatograph. The purpose of the study is
to calibrate the chromatograph by relating the actual amount of the substance to the
chromatograph reading.
Number of cases: 20
Variable Names:
1. amount: amount of substance in the specimen (nanograms)
2. response: output reading from the gas chromatograph
Abstract: Results of a study of gas chromatography, a technique which is used to
detect very small amounts of a substance. Five measurements were taken for each of
four specimens containing different amounts of the substance. The amount of the
substance in each specimen was determined before the experiment. The response
variable is the output reading from the gas chromatograph. The purpose of the study is
to calibrate the chromatograph by relating the actual amount of the substance to the
chromatograph reading.
Figure 1 shows a plot of response vs. amount with a regression line superimposed.
Note the large range of x-values. The regression of amount on response has an Rsquare of 99.9%. However, despite this, the plot shows that the regression line passes
through the data only for the largest amount tested. The responses for other amounts
either lie completely above or below the line.
Figure 2 shows this more clearly, and also shows that the variability of the residuals
increases as the predicted values increase. These are failures of the regression
assumptions of linearity and constant variance. They make the results of the
regression suspect. Despite appearances, the data do not seem linear. Perhaps
transforming the data would provide a better fit.