Statistics and Spreadsheets Harris Chapter 4

Statistics and Spreadsheets
Harris Chapter 4
Gaussian Distribution
Confidence Intervals
Student’s T-Tests
Q-test
Control Charts
Spreadsheets
Gaussian Distribution (random!)
i
• Mean Value:
– The arithmetic
“average”
– For a set of data, the
closer your mean is to
the true value, the more
accurate your results
are!
X=
∑X
0
n
i
Standard Deviation
(reproducibility)
• Standard deviation is based on the fact that you will assume
that errors are the result of RANDOM events.
• It is based on the shape and distribution of the Gaussian
Curve
• A smaller standard deviation means that your results are more
reproducible (they don’t vary as much from measurement to
measurement).
The Gaussian Curve
• Plotting of random
events
• Defines standard
deviation
• Has a mathematical
definition (formula for
the curve)
• Discussed in more
detail in the text
# of Standard
Deviations from the
Mean
% of Events
Affected by Random
Error that Occur
+/- 1 STD DEV
68.3 %
+/- 2 STD DEV
95.5 %
+/- 3 STD DEV
99.9 %
Calculating a STD DEV (by hand)
• Based on the difference
between each value and
the mean.
• Also based on the degrees
of freedom
– Number of measurements
minus one
– n-1
i
s=
∑ (x
i
− x)
0
n -1
2
Let’s do it manually once,
together!
•
•
•
•
M&M’s Results Handed Out to All!
Calculate mean and standard deviation
Setup a simple table
Use table to keep track of the squared
terms!
• LEARN TO DO THIS USING YOUR
CALCULATOR AND MSEXCEL
(STDEV is the correct function)
Confidence Intervals
How Certain Are You?????
• Confidence intervals allow us to calculate a range
of values in which we can be confident, at some
level, that the “true” value lies
• Originally based on the growth of yeast in beer!
• One of the most important tools in evaluating
data!
• Back to Elementary School: draw a number line to
see how this works!
Calculating a Confidence Interval
• Determine the Mean
• Determine the Standard
Deviation
• Determine the degrees of
freedom (n-1)
• Decide how confident
you want to be in your
data (80%, 90%, 95%,
etc.)
• Calculate using
appropriate formula.
t×s
µ=x±
n
t is the value of Student’s t
from a t-table (Figure 4-20
n is the # of observations
s is the standard deviation
Confidence Interval Calculation: John C. Schaumloffel
Calculate the [Zn] at the 95% confidence interval
[Zn] ppm
1.20
1.40
1.50
1.10
1.10
1.26
0.1817
5
4
2.776
u = mean +/- (t x s)/(n^0.5)
mean
STDEV
n
n-1 (degrees of freedom)
t-value, n=5, 95% confident
Harris Table 4-2
0.2255 is the range of the confidence interval (the +/- value)
Confidence Interval = 1.26 +/- 0.23 ppm Zn
Therefore, we are 95% confident that the "true" value for the concentration
concentration of Zinc is between 1.03 and 1.49 ppm.
Comparison of Mean’s
w/Student’s T
• We can compare two sets of data to determine how confident
we are that they are either
– Statistically similar
– Statistically different
• This is ONLY a statistical test, you can also rely on
– Your intuition as a chemist
– Your practical experience
• But, statistical test are what win in court!
• We will concentrate on Harris’ “Case Two”
– A quantity is measured multiple times by two different
techniques. Each technique gives a mean and standard
deviation for the quantity. Are these similar?
• Steps….
– Calculate a pooled standard deviation
– Calculate a t-value using the pooled standard deviation
– Compare the tcalculated to the correct t-value from the table
(ttable)
– If tcalc > ttable, the results are statistically different
– If tcalc < ttable, the results are statistically similar
Are the [Pu] in the contaminated
soil samples from Chemist #1
and Chemist #2 statistically
different?
Q-test to Eliminate Outliers
• Used when you have a set of data with one
or more suspect values (“out of whack”)
• A statistical test you can use to provide
evidence to eliminate an outlier from the
data set
• ONLY a statistical test….
Are any of the soil [Pu] values
outliers? Lets check using the Qtest.
Control Charts
• A graph showing the mean value for a result
collected over a period of time
• Ranges for +/- 1, 2, 3 or more standard deviations
are shown on the graph
• Used to visually see if data are falling out of a
range which would be defined by RANDOM error
– Instrumental Fluctuations
– Standards or Samples Degrading
– Instrument Operator Changing….
• In most regulatory and industrial settings, the mean +/- 2
STDEV is considered acceptable
– Warning limit
• Outside of +/- 2 STDEV is considered the action limit
– You must correct the situation in this case…..
• Usually, repeated analysis of a known standard is used to
develop a control chart.
[Hg] in Quality Control Sample….
Day
1
2
3
4
5
6
7
8
9
10
[Hg] ppb
0.1
0.12
0.12
0.13
0.08
0.09
0.11
0.17
0.2
0.31
UWL
0.280937
0.280937
0.280937
0.280937
0.280937
0.280937
0.280937
0.280937
0.280937
0.280937
LWL
0.005063
0.005063
0.005063
0.005063
0.005063
0.005063
0.005063
0.005063
0.005063
0.005063
UAL
LAL
MEAN
0.349906 -0.06391 STDEV
0.349906 -0.06391
0.349906 -0.06391
0.349906 -0.06391
0.349906 -0.06391
0.349906 -0.06391
0.349906 -0.06391
0.349906 -0.06391
0.349906 -0.06391
0.349906 -0.06391
0.143
0.068969
[Hg] Control Chart (spectrophotometry)
0.4
[Hg] ng/mL
0.3
0.2
0.1
0
-0.1
0
2
4
6
Analysis Day
8
10
12