Some Analytical Chemistry of Potato Chips

Some Analytical Chemistry of
Potato Chips
Lessons on Sampling and ANOVA in
SAS and JMP
Eric Cai
*How much sodium dost a potato crisp hast?
Sodium chloride
NaCl
Images courtesy of Poyraz 72 and Evan-Amos via Wikimedia.
*Shakespearean online translator courtesy of LingoJam by Joseph Rocca.
Objectives
• Estimate the weight percentage of sodium in
a bag of potato chips
• Obtain a confidence interval for the estimated
weight percentage
• Need to minimize the cumulative uncertainty
in the final result
– Minimize the width of the confidence interval
Objectives
• Estimate the weight percentage of sodium in
a bag of potato chips
• Obtain a confidence interval for the estimated
weight percentage
• Need to minimize the cumulative uncertainty
in the final result
– Minimize the width of the confidence interval
Bag of Potato
Chips
1
2
3
4
How to minimize uncertainty?
• Use precise instruments
• Measure many aliquots
• Minimize the variation between the samples
How to minimize uncertainty?
• Use precise instruments
• Measure many aliquots
• Minimize the variation between the samples
Bag of Potato
Chips
1
2
Variation in
Weight
Percentage
Between
Chips
4
3
Variation in
Weight
Percentage
Between
Chips
Variation in
Weight
Percentage
Between
Chips
Bag of Potato
Chips
1
2
Variation in
Weight
Percentage
Between
Chips
Variation in
Weight
Percentage
Within a Chip
4
3
Variation in
Weight
Percentage
Between
Chips
Variation in
Weight
Percentage
Within a Chip
Variation in
Weight
Percentage
Between
Chips
Raw Data – Wide Format
Aliquot 1
Aliquot 2
Aliquot 3
Chip 1
0.324%
0.311%
0.352%
Chip 2
0.455%
0.467%
0.448%
Chip 3
0.420%
0.463%
0.424%
Chip 4
0.447%
0.377%
0.398%
Raw Data – Wide Format
Aliquot 1
Aliquot 2
Aliquot 3
Chip 1
0.324%
0.311%
0.352%
Chip 2
0.455%
0.467%
0.448%
Chip
Desired Data
Long Format
Chip 1
Chip 1
Chip 1
Chip 2
Chip 2
Chip 2
Needed for analysis
in both SAS and JMP
Chip 3
Chip 3
Chip 3
Chip 4
Chip 4
Chip 4
Chip 3
0.420%
0.463%
0.424%
Chip 4
0.447%
0.377%
0.398%
Weight Percentage
* enter the raw data;
data sodium1;
input chip1 chip2 chip3 chip4;
datalines;
0.324 0.455 0.420 0.447
0.311 0.467 0.463 0.377
0.352 0.448 0.424 0.398
;
run;
* transpose the data;
* convert the weight percentages from a vertical display to a horizontal display;
proc transpose
data = sodium1
out = sodium2
name = sample
prefix = aliquot;
var chip:;
run;
* show the transposed data;
proc print
data = sodium2;
run;
Long, but still wide
sample
aliquot1
aliquot2
aliquot3
chip1
0.324
0.311
0.352
chip2
0.455
0.467
0.448
chip3
0.420
0.463
0.424
chip4
0.447
0.377
0.398
* sodium2 needs to be transposed once more for all weight percentages to be in one column;
proc transpose
data = sodium2
out = sodium3
(
rename = (
col1 = weight_percentage
)
)
name = subsample;
var aliquot:;
by sample;
run;
* show sodium3 - it is now ready for analysis;
proc print
data = sodium3;
run;
Transformed Data – Long Format
sample
subsample
weight_percentage
chip1
aliquot1
0.324
chip1
aliquot2
0.311
chip1
aliquot3
0.352
chip2
aliquot1
0.455
chip2
aliquot2
0.467
chip2
aliquot3
0.448
chip3
aliquot1
0.420
chip3
aliquot2
0.463
chip3
aliquot3
0.424
chip4
aliquot1
0.447
chip4
aliquot2
0.377
chip4
aliquot3
0.398
PROC TRANSPOSE X 2
Wide to Long
Aliquot 1
Aliquot 2
Aliquot 3
Chip 1
0.324%
0.311%
0.352%
Chip 2
0.455%
0.467%
0.448%
Chip 3
0.420%
0.463%
0.424%
Chip 4
0.447%
0.377%
0.398%
sample aliquot1 aliquot2 aliquot3
sample
subsample
weight_percentage
chip1
aliquot1
0.324
chip1
aliquot2
0.311
chip1
aliquot3
0.352
chip2
aliquot1
0.455
chip2
aliquot2
0.467
chip1
0.324
0.311
0.352
chip2
aliquot3
0.448
chip2
0.455
0.467
0.448
chip3
aliquot1
0.420
chip3
0.420
0.463
0.424
chip3
aliquot2
0.463
chip4
0.447
0.377
0.398
chip3
aliquot3
0.424
chip4
aliquot1
0.447
chip4
aliquot2
0.377
chip4
aliquot3
0.398
See the November, 2015, issue of the VanSUG newsletter
about PROC TRANSPOSE by Dilinuer Kuerban
Visualize the Data
Visualize the Data
Group-specific means
Sample means within
each group (chip)
Grand Mean
Sample mean of all data
Visualize the Data
Within-group
variation
Between-group
variation
Compare the 2 sources of variation
• Analysis of Variance (ANOVA)
– Linear regression with categorical predictors
– Partition a continuous variable by a categorical factor
– Use sum of squares to quantify the variation
– Sum of deviations of data away from the average
• Scale (divide) each sum by the number of degrees of
freedom
Visualize the Data
Within-group
variation
Between-group
variation
Analysis of Variance (ANOVA)
• Use sum of squares to quantify the variations
• Sum of deviations of data away from the
average
Between-group variation
vs.
Within-group variation
* use ANOVA to partition and compare the 2 sources of variation;
proc anova
data = sodium4;
class sample;
model weight_percentage = sample;
run;
You can also use PROC GLM to implement ANOVA.
ANOVA is one special case of general linear models.
PROC ANOVA should only be used when there are equal numbers of
observations for every combination of the classification factors.
• There are many exceptions to this!
Image courtesy of Cdang via Wikimedia
There is much more variation in the weight percentage
of sodium between the chips than within the chips!
Bag of Potato
Chips
1
2
Variation in
Weight
Percentage
Between
Chips
Variation in
Weight
Percentage
Within a Chip
4
3
Variation in
Weight
Percentage
Between
Chips
Variation in
Weight
Percentage
Within a Chip
Variation in
Weight
Percentage
Between
Chips
JMP
•
•
•
•
•
•
A software from The SAS Institute
Point-and-click
Has underlying scripting language
Statistics
Machine learning
Industrial statistics
• Go to JMP demonstration!
Bag of Potato
Chips
1
2
3
4
Bag of Potato
Chips
1
There is a trade-off!
2
3
4
More measurements
are needed!
Thank you JMP staff!
Louis Valente
Manager of Global Field Enablement for JMP
Mark Bailey
Principal Analytical Training Consultant for JMP
Arati Mejdal
Global Social Media Manager for JMP Software