JMP Session Lab 1

Fall 2012
JMP Session Lab 1 (assn 2)
Objectives:
 Open a data set
 Assign the appropriate data types (continuous, ordinal, categorical) to a column of data
 Get descriptive statistics (number of measurements/subjects (n), mean, standard deviation (SD))
 Analyze a histogram
 Compare matched or unmatched data
 Stratify data by group
 Save the graphs and analysis to word
1. Download the dataset fat.xls from the website: http://gornbein.bol.ucla.edu/cedarassign.htm
In this data set, “TFAT” is total dietary fat in grams and “PFAT” is the percent of body weight that is
composed of fat. There are two times, time 0 baseline and 36 week follow up. There are two treatment
groups. Group 1 was given a dietary education and intervention and Group 2 is a control group.
2. Although JMP Data Tables (ie, spreadsheets) have the file extension .jmp, JMP will also open several
other file types, including xls. Use "Open Data Table" or File>Open or Ctl>O to open fat.xls.
3. The Analyze menu contains four main analysis categories:
Distribution- Creates a histogram
Fit Y by X- Creates a scatter plot
Matched Pairs- Compares the same group measured twice (for example, before and after)
Fit Model- Performs a multivariate analysis of the effect of multiple variables on an outcome
Distribution
4. Select Distribution to create a histogram of TFAT0. Say you wanted to look at group 1 and 2
separately (this is called stratification). To do this, add group to the box labeled By.
5. You should see two graphs, one showing the distribution of TFAT0 for people in group 1 and one
showing TFAT0 for people in group 2. Notice that this gives you the sample size, mean and SD of the
data (ie, descriptive statistics) to the right. Here, I rotated the display by clicking the hotspot by TFAT0
and selecting Display Options>Horizontal Layout.
Fall 2012
Fit Y by X- for comparing two unmatched groups
6. Go back to the data table and select Fit Y by X from the analyze menu. Enter TFAT0 and TFAT36 as Y.
Enter Group as X.
7. Observe that JMP has assumed Group is a continuous variable, as the axis for group is labeled 0, 0.1,
0.2 etc. For now, this cosmetic error is merely annoying, but incorrectly assigned variable types can
affect the analysis as well, as many tests are based on the assumption that data fits a certain distribution.
If you click on the hotspot (red triangle next to blue triangle), you will see that the t-test is not available.
Fall 2012
8. For the next graph, we want to jitter (offset) overlapping data points to improve visualization. Here’s
how: File>Preferences>Platforms>Oneway and check the box at the end that says Points Jittered. Click
apply and Ok.
9. Go back to the original Data Sheet and look at the list under Columns on the left. Most of the
column variables have blue triangles, representing continuous data. In JMP, data in a column is
continuous, ordinal, or nominal.



Continuous data can conceptually take on any value in an interval (ie age, weight, height)
Ordinal data data values are categorical, but may be ranked in some way (ie grade level)
however the number assigned doesn't imply any actual mathematical relationship
(three first graders one third grader)
Nominal or Categorical data represents an unordered category (ie race, gender, hair
color)
Fall 2012
10. Group should be a Nominal variable then. Right click on the blue triangle icon next to group and
change it to Nominal.
11. Now we can understand some things about the analyses available when comparing different
combinations of variable types. When you select Fit Y by X, the following plot appears as a guide:
Bivariate (
by
) shows change in x as a function of y, used for regression and curve fitting.
Logistic ( either
or by
) shows how well the value predicts the category, used for odds ratios
and Receiver Operating Characteristic.
Oneway(
by either
or
) Compares means (ie, weight by group), used for t test and anova.
Contingency ( or by or ) shows how well one category (test positive) predicts another
category (true positive), used for sensitivity and specificity, and chi square.
12. Select Fit Y by X from the analyze menu. Enter TFAT0 and TFAT36 as Y. Enter Group as X. JMP will
now use the correct analysis, which is Oneway. Click on the red triangle ("hot spot") and note that t-test
is now available. Prob > |t| gives the p value- significant difference at 36 weeks!
Fall 2012
13. The last skill for this lab is saving the graphs. On the graph window right click on the bar that
says “Fit Y by X Group” and choose edit>Copy picture. Paste it into an empty word document,
save and email it to your TA. It should look something like this:
Homework Hint: Today we performed unmatched analysis, but the homework will ask for both matched
and unmatched analyses. What option would you choose from the Analyze menu to perform a matched
analysis?
Fall 2012
JMP Session Lab 2 (assn 3)
Objectives:
Compare distributions
Perform log transformations
Perform goodness of fit tests
Use equations for math or logic
Determine sensitivity, specificity, and accuracy
Perform ROC analysis
1. Download and open PSA.xls from the website: http://gornbein.bol.ucla.edu/cedarassign.htm
2. Analyze the distribution of PSA stratifying by group (Analyze>Distribution>add PSA as Y, group as X).
Use the red hotspot to access additional commands. Fit a normal curve by selecting Continuous Fit>
Normal. Try Continuous Fit>LogNormal.
3. To the right of the plot there are two boxes titled Fitted Normal and Fitted LogNormal. Use their hot
spot to select "Goodness of Fit."
 Goodness of Fit tests test the hypothesis that the data come from a Normal or LogNormal
distribution.
 Small P values reject the hypothesis (meaning the distribution is not normal or LogNormal).
 Which distribution provides a better fit (has a larger p value)?
4. Here’s another way to check: use the hot spot next to PSA and check Normal Quantile Plot. The
Quantile Plot or Q-Q plot is a plot comparing the distribution function to a Normal distribution. If the
data has a normal distribution, the black dots form a straight line.
5. To add a formula, create a new column using Cols>New Column, or right click on the blank
column>New Column. Name it logPSA. Under Column Properties, select Formula. Use
Transcendental>Log10 then click on PSA to enter the base 10 log of PSA.
6. For logPSA, plot the distribution, then fit a Normal and test the Goodness of Fit. If the p value is
larger than the p value of the original PSA, the fit is improved with log transformation. Add the Quantile
Plot. Use Edit>Copy Picture to move the Q-Q plot and the Goodness-of-Fit analysis into a word
document. Save it as part of the report.
Part 2: Sensitivity, specificity, and accuracy
7. Let's say we want to determine a cutoff for PSA or log PSA that predicts whether a person was in the
disease or normal group. For this, we can use Receiver Operating Characteristic (ROC) analysis to select
a threshold with optimal sensitivity and specificity.
8. Recall what we learned in Lab 1 about Logistic analysis:
Fall 2012
Logistic ( either
or by
) shows how well the value predicts the category, used for odds ratios
and Receiver Operating Characteristic.
9. Use Analyze>Fit Y by X and enter Group as Y and PSA as X. Did you change group to nominal?
10. Use the hotspot to add the ROC curve. Select 2 (control group) as the positive level. The yellow
tangent intersects where there are the greatest number of true positives and the smallest number of
false positives, ie at maximum sensitivity and specificity. Record these values.
Sensitivity______ Specificity ______
11. Click the triangle by ROC Table to expand it. Locate your combination of Sensitivity and Specificity in
the table. The ideal threshold value is marked by an asterisk to the right of the Sens-(1-Spec) column.
Find the corresponding value in column X.
X _____ TP_____ TN_____ FP_____ FN_____
12. Here’s a way to test our new threshold. Recall what we learned about contingency plots (also called
a cross-tabulation) in Lab 1:
Contingency ( or by or ) shows how well one category (test positive) predicts another
category (true positive), used for sensitivity and specificity, and chi square.
13. We already know whether each subject was really in group 1 or 2 (true values) and we need a
second column of categorical values that uses our threshold to determine whether the subject was
probably in group 1 or 2 (test values) based on their TFAT36 value. Add a column named PredictGroup.
Under column properties, select formula. Using functions under conditional and comparison, enter the
statement If TFAT36≥threshold, then 2, else 1.
14. Make a contingency plot of Group (Y) and Predict Group (X). This is literally called a “Table of
Confusion” in JMP and a Classification table in medical literature.
Fall 2012
Actual Group
normal
Predicted
Group
Disease
normal
True
Negatives
disease
False
Negatives
False
Positives
True
Positives
15. Click on the hotspot and check only Count. Count gives the number of cases in each category (TP,
FP, TN, FN). Use the equations for sensitivity and specificity
16. Go back to the hot spot and check col%. The TP box shows sensitivity, and the TN box shows
specificity.
17. One last thing. What if instead of having all of the individual PSA or log PSA values for the diseas
and normal groups, you were only told the mean and standard deviation for each group, and told that
the distribution is normal. Do you have enough information to calculate sensitivity and specificity for a
given threshold?
18. The Z score describes the distance between any value on a distribution and the population mean in
units of the standard deviation. Z is negative when the value is below the mean, positive when above.
(Image used with permission, CreativeCommons)
19. Determine the Z score for each group (get two Z scores) using threshold (t) 39.3 for both. How can
you get the mean (μ) and standard deviation(σ) for the groups?
Fall 2012
20. Now create a new column called zscoreG1, and for column properties select formula. Select
Probability>Normal Distribution. In the brackets, enter the Z for the disease Group : Normal
Distribution(Z). Make another column for zscoreG2 using the Z for then normal Group. The value in the
column is the percentile that corresponds to the Z score you entered. The value under zscoreG1 is
sensitivity, and the value of zscoreG2 is 1-specificity.
Fall 2012
JMP Session Lab 3 (assn 4)
Objectives:
 Comparing means using summary statistics
 t tests for matched data
 Graph association and calculate correlation
Use the “fat” dataset as before where “TFAT” is total dietary fat in grams and “PFAT” is
the percent of body weight that is composed of fat. As before, there are two time
periods, time “0” baseline (TFAT, PFAT0) and 36 week follow up. Assuming that the
time 0 and time 36 value is from the same person (ie each person has a baseline and 36
week follow up).
1. Carry out and report t tests (use Oneway Analysis) for comparing TFAT0, PFAT0, TFAT36 and
PFAT36 between the two groups along with the appropriate summary statistics (mean, SD, SEM) .
TFAT0
Group 1: mean: _______ SD:_________ SEM:_________
Group 2: mean: _______ SD:_________ SEM:_________
T test p value for group 1 versus group 2: _____________
TFAT36
Group 1: mean: _______ SD:_________ SEM:_________
Group 2: mean: _______ SD:_________ SEM:_________
T test p value for group 1 versus group 2: _____________
PFAT0
Group 1: mean: _______ SD:_________ SEM:_________
Group 2: mean: _______ SD:_________ SEM:_________
t test p value for group 1 versus group 2: _____________
PFAT36
Group 1: mean: _______ SD:_________ SEM:_________
Group 2: mean: _______ SD:_________ SEM:_________
Fall 2012
t test p value for group 1 versus group 2: _____________
2. Carry out and report t tests for comparing the change from time 0 to 36 weeks in TFAT and PFAT
between the two groups.
TFAT36-TFAT0
Group 1 p value: ________
Group 2 p value: ________
PFAT36-PFAT0
Group 1 p value: ________
Group 2 p value: ________
3. Create and report a scatter plot of the change in TFAT versus the change in PFAT and
also report the correlation coefficient and linear regression equation for each group.
Group 1:
Correlation coefficient: ___________
Linear regression equation:______________
Group 2:
Correlation coefficient: ___________
Linear regression equation:______________
Are the mean levels of TFAT change and PFAT change the same between the two
groups? Is the relationship between TFAT change and PFAT change the same in each
group?